LSSTDESC / TJPCov

TJPCov is a general covariance calculator interface to be used within LSST DESC
https://tjpcov.readthedocs.io
MIT License
11 stars 1 forks source link

Adding an example MPI script for NERSC #72

Closed mattkwiecien closed 1 year ago

mattkwiecien commented 1 year ago

Getting MPI and mpi4py to work at NERSC can be sort of tricky. Adding a simple "runner" script to run TJPCov on NERSC with slurm's srun.

As a part of this we discovered a lot of tests were failing. This is a numerical precision issue probably due to new software package versions. For now we increased the tolerances but probably should look into this?

One of the tests that was consistently failing was due to a race condition when writing workspaces with namaster. The files would be deleted by a process before the file could be read by another process.

This was fixed by implementing a GlobalLock object in tools module. This new lock can be used as

with GlobalLock():
    # Do exclusive access stuff

This is functionally a system wide mutex implementation for python.

carlosggarcia commented 1 year ago

@mattkwiecien if you're not using the datasets for tests, I think they would be better placed in the examples folder. That way we keep track of what's for testing.

mattkwiecien commented 1 year ago

Updated with new GlobalLock object. Also updated the description. @carlosggarcia not sure why rtd is failing?