alchemistry / alchemlyb

the simple alchemistry library
https://alchemlyb.readthedocs.io
BSD 3-Clause "New" or "Revised" License
191 stars 49 forks source link

[TST] avoid evaluating data loader functions in parametrized tests at set-up time #206

Closed orbeckst closed 1 year ago

orbeckst commented 2 years ago

Problem

Our tests contain examples of parametrizing over different datasets by calling the data loader function in the pytest.mark.parametrize such as https://github.com/alchemistry/alchemlyb/blob/72622c99c16c74efd1f8517cb15ef21e16d6ebda/src/alchemlyb/tests/test_preprocessing.py#L66-L69

The data loader functions gmx_benzene_dHdl() and gmx_benzene_u_nk() are called at the time when pytest collects tests. This slows down the test setup phase substantially (it's done in serial) and it also has the potential to fill up memory.

Solution

Make sure that the loader function is only evaluated inside the test.

Use pytest's getfixturevalue

Use request.getfixturevalue(fixture_name) to dynamically run the fixture function named fixture_name

Should probably look like

@pytest.mark.parametrize(('dataloader', 'size'), [('gmx_benzene_dHdl', 661), 
                                                  ('gmx_benzene_u_nk', 661)]) 
     def test_basic_slicing(self, dataloader, size, request):
         data = request.getfixturevalue(dataloader)
         assert len(self.slicer(data, lower=1000, upper=34000, step=5)) == size 

Note that in the example above, the dataloader is the name of a pytest fixture and not an ordinary function (as currently implemented in our tests).

Current hacky solution in alchemlyb

Instead, only pass the function objects to a parametrized fixture and then evaluate inside the parametrized fixture itself, as shown, for example in https://github.com/alchemistry/alchemlyb/blob/72622c99c16c74efd1f8517cb15ef21e16d6ebda/src/alchemlyb/tests/test_fep_estimators.py#L151-L168 This approach ensures that data is loaded when needed and can be done in parallel.

TODO

orbeckst commented 2 years ago

Note (from https://github.com/alchemistry/alchemlyb/pull/199#issuecomment-1177398645 ) that the data-loading functions will have to be converted to fixtures and then won't be usable as functions anymore.