Problem

Our tests contain examples of parametrizing over different datasets by calling the data loader function in the pytest.mark.parametrize such as https://github.com/alchemistry/alchemlyb/blob/72622c99c16c74efd1f8517cb15ef21e16d6ebda/src/alchemlyb/tests/test_preprocessing.py#L66-L69

The data loader functions gmx_benzene_dHdl() and gmx_benzene_u_nk() are called at the time when pytest collects tests. This slows down the test setup phase substantially (it's done in serial) and it also has the potential to fill up memory.

Solution

Make sure that the loader function is only evaluated inside the test.

Use pytest's getfixturevalue

Use request.getfixturevalue(fixture_name) to dynamically run the fixture function named fixture_name

Should probably look like

@pytest.mark.parametrize(('dataloader', 'size'), [('gmx_benzene_dHdl', 661), 
                                                  ('gmx_benzene_u_nk', 661)]) 
     def test_basic_slicing(self, dataloader, size, request):
         data = request.getfixturevalue(dataloader)
         assert len(self.slicer(data, lower=1000, upper=34000, step=5)) == size

Note that in the example above, the dataloader is the name of a pytest fixture and not an ordinary function (as currently implemented in our tests).

Current hacky solution in alchemlyb

Instead, only pass the function objects to a parametrized fixture and then evaluate inside the parametrized fixture itself, as shown, for example in https://github.com/alchemistry/alchemlyb/blob/72622c99c16c74efd1f8517cb15ef21e16d6ebda/src/alchemlyb/tests/test_fep_estimators.py#L151-L168 This approach ensures that data is loaded when needed and can be done in parallel.

TODO

[x] find all instances of up-front data loading
[x] replace with code that uses getfixturevalue

alchemistry / alchemlyb

[TST] avoid evaluating data loader functions in parametrized tests at set-up time #206

Problem

Solution

Use pytest's getfixturevalue

Current hacky solution in alchemlyb

TODO