ChEB-AI / python-chebai

GNU Affero General Public License v3.0
11 stars 4 forks source link

Data migration #37

Closed aditya0by0 closed 1 month ago

aditya0by0 commented 2 months ago
  • [x] When initialising a dataset, the user has the option to provide a file path to csv file that contains a list of chebi ids and their assignment to a dataset (either train, validation or test). Then, instead of creating a new split, the provided split will be used
  • [x] When initialising the dataset without providing such a file, the splits will get created automatically (as before) and the resulting split is saved as a csv file
  • [x] When running the migration script, the chebi data files will be copied into the new structure. For the splits, the split files are combined into one file and a csv file for the split assignment will be created in addition.
aditya0by0 commented 2 months ago
sfluegel05 commented 1 month ago

As far as I am aware, this branch originates from the brach used in PR #29. Therefore I will just merge this directly into dev. I also added some minor changes: The cli for the migration script now uses jsonargparse, which means that one can directly use a config file instead of having to resolve the individual parameters for each class with separate arguments (this also covers other ChEBI configuration such as ChEBIOverXPartial). If specific files are not found, they are just skipped. And I added some hints for users.