TUW-GEO / ismn

Readers for the data from the International Soil Moisture Network
https://ismn.earth/en/
MIT License
32 stars 21 forks source link

Improvement: don't use absolute path names in metadata cache #9

Open awst-baum opened 5 years ago

awst-baum commented 5 years ago

The ISMN reader seems to create a cache of the ISMN data in a folder in a subfolder called python_metadata in a file called metadata.npy.

metadata.npy contains absolute pathnames of the stm files, e.g. /path/to/mydata/ISMN/ISMN_V20180830_GLOBAL/AMMA-CATCH/Banizoumbou/AMMA-CATCH_AMMA-CATCH_Banizoumbou_sm_0.050000_0.050000_CS616-1_19780101_20180830.stm.

In our setup, we use the same NFS mount containing ISMN data in several systems, and not all systems use the same mountpoint (e.g. one might have the data at /path/to/mydata and the other have it at /some/completely/different/path/to/my/data). Also, we use linux softlinks, so we might also access the data as /softlink/to/data/....

Once the metadata folder is created by one of the systems, the other systems may produce errors if they don't use the same path to access the data because they can't find the stm files at the paths cached in the metadata.npy file.

Would it be possible to not use absolute file paths or make it possible in some other way to use the same ISMN folder from several systems?

wpreimes commented 5 years ago

maybe add an option to allow user to select where to store/load the metadata from?

wpreimes commented 5 years ago

Generally I think the metadata handling could be improved, e.g raising a meaningful error when the data has been moved and the stored paths are not valid anymore.

cpaulik commented 5 years ago

I would just start with relative paths. That should be pretty straightforward to implement.

Custom metadata path might be a little bit trickier.