ISMN reader produces different indices for same stations on different systems

TUW-GEO / ismn

Readers for the data from the International Soil Moisture Network

https://ismn.earth/en/

MIT License

32 stars 21 forks source link

ISMN reader produces different indices for same stations on different systems #5

Closed awst-baum closed 6 years ago

awst-baum commented 6 years ago

Copied from https://github.com/TUW-GEO/pytesmo/issues/143

Like it says in the title: The ISMN reader (in particular get_dataset_ids) produces different indices for the same stations on different systems. This makes it harder to check errors occurring in a production system on your developer machine. I think this comes from the metadata collector (pytesmo.io.ismn.metadata_collector.collect_from_folder) line 57, which uses os.walk. os.walk doesn't guarantee an order but we could sort the folders and files lists to alphabetical order. This would take care of the problem, barring locale issues (different sorting order from different locales).

cpaulik commented 6 years ago

Yes, the reason should be os.walk. We could also sort the numpy array that is the result of the metadata collector. Both approaches should work.

awst-baum commented 6 years ago

Yup. Do you see an advantage for one of the approaches? I have no idea if one of them would be more performant.

I've got a patch for this sitting on my harddisk, I just need to fork this repos and create a pull request. Admittedly, the largest part of the patch is the unit test ;-)

cpaulik commented 6 years ago

Ordering the resulting metadata might make it more future proof and consistent if the metadata is not coming from filenames but e.g. a ISMN API call. We talked about this a little bit during the kickoff. I write this here mainly for spreading this information. This is so far off that we should not over-complicate this implementation now.

cpaulik commented 6 years ago

Fixed by #6