ZedThree commented 2 years ago

The unit tests take about 30 minutes to complete, mostly it's collect. Maybe we can mock out the netcdf calls to something much faster?

d7919 commented 2 years ago

Is it possible to get the test data created in ram instead of on disk?

johnomotani commented 2 years ago

It seems like it should be possible. One possibility (?):

Refactor boututils.datafile.DataFile.__init__() so that it gets the implementation from a dict that we can then monkey-patch a new implementation into.
Within the tests, make an in-memory implementation of boututils.datafile.DataFile. Monkey patch it onto DataFile.
Use the in-memory version for most of the tests, just leaving a few writing to disk so we do test that NetCDF works.

Edit: Although the part I haven't figured out with this: how to pass the 'file' to collect... That could actually be a show-stopper. Edit2: The way we got around this for xbout was to handle a list of xarray.Dataset passed to the datapath argument of open_boutdataset(). Could possibly also do this for collect by allowing a list of DataFile to be passed - not sure how much of collect()'s logic this would bypass...

johnomotani commented 2 years ago

There's this package https://pypi.org/project/memory-tempfile/, which might allow doing file I/O on a RAM-disk (on Linux, seems to be possible to provide a fall-back to disk-based I/O for other OSes).

dschwoerer commented 1 year ago

For me they are even slower: ====================== 591 passed in 62823.58s (17:27:03) ======================

I got it to run faster locally on my laptop, having it all on a ram disk, there it finished in 10 hours. But that might be due to some failures and not finishing:

============================= test session starts ==============================
platform linux -- Python 3.12.0, pytest-7.4.2, pluggy-1.3.0
rootdir: /tmp/boutdata
plugins: anyio-3.7.1
collected 591 items

boutdata/tests/test_boutoptions.py ................                      [  2%]
boutdata/tests/test_collect.py ......................................... [  9%]
........................................................................ [ 21%]
........................................................................ [ 34%]
........................................................................ [ 46%]
........................................................................ [ 58%]
........................................................................ [ 70%]
...............................................F..F..F..F..F..F..F..F..F [ 82%]
..F..F..
real  648m38.772s
user  499m23.204s
sys  101m39.977s

This is on Fedora rawhide, on fedora 38 they are still "fast" and finish within a reasonable time frame ...

dschwoerer commented 1 year ago

Removing gc.collect() from squashing makes it much faster ...

ZedThree commented 1 year ago

Is that there as a workaround for xarray's aggressive caching?

dschwoerer commented 1 year ago

boutdata should not be using xarray?

I think it was put there as a (premature?) optimisation to minimise memory usage ...

Coming back to the original thread, I doubt moving IO to a ramdisk or similar does help, we spend 30 minutes in userspace, and only 1 minute in sys:

real    31m17.721s
user    30m35.586s
sys 1m3.874s

dschwoerer commented 1 year ago

101 helps with this, but it is still slow, in part due to having dozens of tests, where a few should probably do the job ...

boutproject / boutdata

Unit tests are slow #68

101 helps with this, but it is still slow, in part due to having dozens of tests, where a few should probably do the job ...