Closed hbrincon closed 6 months ago
@hbrincon VoidFinder should not raise any errors in normal execution so you can be sure the algorithm is not continuing to work correctly in this case.
I'll see if I can get the data from Kelly to run with this - but can you provide:
Minimum reproducing code (did you change anything in Gadget_VoidFinder_periodic.py or is it fresh off a git branch?) Python environment information (pip list, conda list -n myenv, etc) VAST branch/commit hash you were running with Operating System you were running with compiler version used to build VAST
If you need help obtaining this information in your environment Kelly and I can help
Here is the info (edited to add compiler version):
Code that gathers the survey data (should be run on NERSC) In void analysis repo
Reproducing code for VoidFinder (unchanged beyond the bounding volume and survey name) In void analysis repo
Python Environment (I'm unsure if python at NERSC by default uses the environment specified by pip or conda, but I'm including the pip one) In void analysis repo
Commit/hash master 699d2f062c728f15c35fb237834e5b6f161269ba
OS SUSE Linux Enterprise Server 15 SP4 (Release 15.4H)
Compiler Version gcc version 11.2.0 20210728 (Cray Inc.) (GCC)
Discovered a number of issues with the Python 'mmap' object which provides the necessary shared memory resource for VoidFinder, opened a corresponding bug report on the CPython github https://github.com/python/cpython/issues/115635
May have found a temporary workaround until CPython can fix Issue is that CPython mmap object doesn't like to do anything while other python objects reference it - workaround solution is to delete the mmap object and then use the file descriptor to re-open a new mmap object and continue as normal, don't anticipate a major performance hit but this is a bit stupid
PR #110
Running VAST/example_scripts/Gadget_VoidFinder_periodic.py with an appropriate bounding volume on the 2Gpc Abacus Simulation hugebase halo catalog results in a continuous stream of KeyErrors (see attached image). It’s unclear if the algorithm is continuing to work despite these errors.