MLMI2-CSSI / foundry

Simplifying the discovery and usage of machine-learning ready datasets in materials science and chemistry
MIT License
80 stars 16 forks source link

Investigate HDF5 data location bug #391

Open ascourtas opened 1 year ago

ascourtas commented 1 year ago

As a Data Consumer, I want to ensure the data files for the datasets I download all go to the same general location.

Jingrui has found a bug where for some reason extra copies of HDF5 files only are downloaded to ./data in addition to their proper locations. This isn't a game-breaking bug, but is untidy and should be resolved.

Track down the bug and resolve it such that all data go to the appropriate folders in the format /data/dataset_name_vx.x/

From Jingrui: "Here is one about downloading extra instance of data files that I just reported to Ben and I'd share the details here. When downloading a dataset, Foundry should create directory and save the data at ./data/dataset_name_vx.x/ . While for the h5 datasets, mostly published by me and there's also one published by someone else, extra copies of the files are saved at ./data/ . All the tabular datasets I tested are fine."

Image

Acceptance criteria

  1. The bug has been resolved and the data are only downloaded to the appropriate folder for that dataset, as described above
  2. The fix is released
  3. Jingrui has been alerted to the fix
kjschmidt913 commented 1 year ago

Is this where the metadata conversation is coming from? If so, can someone rewrite this story? Or write a new one for it and add it to the sprint

blaiszik commented 8 months ago

Is this error still occurring? I wonder the path fixes I made to https download fixed this. @aristana @kjschmidt913