astropy / halotools

Python package for studying large scale structure, cosmology, and galaxy evolution using N-body simulations and halo models
http://halotools.rtfd.org
BSD 3-Clause "New" or "Revised" License
104 stars 65 forks source link

Tests are shaky in parallel invocation #1101

Open Hellseher opened 1 month ago

Hellseher commented 1 month ago

Hi,

During preparing Halotools for Guix package index I've noticed that tests randomly fail when --numprocess is provided to activate pytest-xdist module for parallel jobs, it's not happaning with single job.

Round 1
FAILED halotools/sim_manager/tests/test_halo_table_cache.py::TestHaloTableCache::test_determine_log_entry_from_fname
FAILED halotools/sim_manager/tests/test_ptcl_table_cache_log_entry.py::TestPtclTableCacheLogEntry::test_scenario4
FAILED halotools/sim_manager/tests/test_ptcl_table_cache.py::TestPtclTableCache::test_add_entry_to_cache_log1
FAILED halotools/sim_manager/tests/test_halo_table_cache.py::TestHaloTableCache::test_remove_entry_from_cache_log
FAILED halotools/sim_manager/tests/test_ptcl_table_cache.py::TestPtclTableCache::test_add_entry_to_cache_log3
FAILED halotools/sim_manager/tests/test_ptcl_table_cache.py::TestPtclTableCache::test_determine_log_entry_from_fname1
FAILED halotools/sim_manager/tests/test_halo_table_cache.py::TestHaloTableCache::test_add_entry_to_cache_log
FAILED halotools/sim_manager/tests/test_halo_table_cache.py::TestHaloTableCache::test_update_cached_file_location
FAILED halotools/sim_manager/tests/test_ptcl_table_cache.py::TestPtclTableCache::test_determine_log_entry_from_fname2
FAILED halotools/sim_manager/tests/test_ptcl_table_cache.py::TestPtclTableCache::test_determine_log_entry_from_fname3
FAILED halotools/sim_manager/tests/test_user_supplied_ptcl_catalog.py::TestUserSuppliedPtclCatalog::test_add_ptclcat_to_cache4
FAILED halotools/sim_manager/tests/test_ptcl_table_cache_log_entry.py::TestPtclTableCacheLogEntry::test_passing_scenario

Round 2
FAILED halotools/sim_manager/tests/test_ptcl_table_cache_log_entry.py::TestPtclTableCacheLogEntry::test_scenario2a
FAILED halotools/sim_manager/tests/test_ptcl_table_cache_log_entry.py::TestPtclTableCacheLogEntry::test_scenario2c
FAILED halotools/sim_manager/tests/test_user_supplied_ptcl_catalog.py::TestUserSuppliedPtclCatalog::test_add_ptclcat_to_cache6
FAILED halotools/sim_manager/tests/test_user_supplied_halo_catalog.py::TestUserSuppliedHaloCatalog::test_add_halocat_to_cache1

Inputs:

aphearin commented 1 month ago

Thanks for reporting this issue. I've never ran the halotools test suite in parallel and so I have not noticed this before. All of the failing tests you show appear to do with the (ad hoc) system the library uses to store and create a persistent memory of simulation data. In the test suite, the code creates some tiny simulation data, creates a log entry of the fake sims, and then runs tests on the logging mechanisms. Errors in these tests being run in parallel makes me think that some threads may be running tests on fake simdata that has not been created yet, or something like that. This would be harmless in terms of the performance of the source code, although I realize that's annoying for purposes of parallel testing. Do you have a workaround?

Hellseher commented 1 month ago

Hi,

Thank you for detailed replay.

I did not go too far with investigation of possible solutions yet, prepared it without pytest-xdist enabled. From my experience with some related packages (astropy, asdf) they have quite thread save tests suites, which benefits in CI.

The potential solution would be consolidate create/test pairs as a single unit test.

Thanks, Oleg