galsci / pysm

PySM 3: Sky emission simulations for Cosmic Microwave Background experiments
https://pysm3.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
36 stars 23 forks source link

Passing CMB map files that live outside PYSM_LOCAL_DATA #102

Closed ziotom78 closed 2 years ago

ziotom78 commented 2 years ago

In the LiteBIRD Simulation Framework, we generate a set of CMB maps and pass them to PySM3, instead of using the maps produced by PySM3 itself (see the code here).

Since we generate the CMB maps in our code, they are saved outside the directory where PySM3 keeps its templates. However, pysm3.CMBMap expects relative paths rather than absolute. To let pysm3.CMBMap find our maps, our code overrides PYSM_LOCAL_DATA every time we need to include the CMB in the simulated sky. This is not optimal, as in our CI pipeline we need PYSM_LOCAL_DATA to point to the folder where we store a cache of the templates: obviously, this folder is always overwritten when we include any CMB signal in the sky!

I and @NicolettaK have thought about possible way to solve this:

  1. Avoid including the CMB signal when running the tests. We have decided to follow this route, but we fear that the problem might appear again in the future, if somebody wants to run our pipeline on a server where PySM3 templates are supposed to be kept locally.
  2. Save the value of PYSM_LOCAL_DATA before overriding it, and then restore the old value once the CMB sky has been generated. However, this is tricky to implement, if one wants to be robust when running several MPI processes.
  3. Prepend/append the directory where we have saved our CMB maps to PYSM_LOCAL_DATA instead of overwriting the variable completely. This would work if this variable worked similarly to the UNIX PATH variable, i.e., a colon-separated list of directories like /storage/PySM3/templates:/litebird_CMB_maps. Unfortunately, it seems that this behavior is not supported, as pysm3/utils/data.py contains the following line:

     self.data_folders.append(os.environ["PYSM_LOCAL_DATA"])

    instead of

     self.data_folders += os.environ["PYSM_LOCAL_DATA"].split(os.pathsep)

Does any of this sound reasonable? Or are there other simpler and viable solutions we are missing?

zonca commented 2 years ago

thanks @ziotom78, it seems like a common use-case, it is probably useful to support it across pysm by changing the RemoteData class.

What if I check if a path is absolute with https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.is_absolute and if so, I just return that disregarding local data?

This still wouldn't support the case where you want to provide a path relative to the current folder, but not sure how to distinguish that from a path relative to the LOCAL_DATA folder.

ziotom78 commented 2 years ago

This idea is probably the best, as it applies the principle of least surprise!

Regarding the possibility to use a path relative to the current folder, one can just convert a relative path to absolute before passing it to PySM.

NicolettaK commented 2 years ago

Thanks @zonca and @ziotom78, the solution you propose sounds good to me.

zonca commented 2 years ago

ok, I've created a unit test that I was expected to fail, but it seems it is actually working already for a full path. Can you please double-check in https://github.com/galsci/pysm/pull/107?

ziotom78 commented 2 years ago

Hi @zonca , sorry for the delay. I and @NicolettaK have tested the use of Path.absolute() and can confirm that it works: if the path starts with /, the right file is loaded by PySM.

Thanks a lot!