ChEB-AI / python-chebai

GNU Affero General Public License v3.0
12 stars 4 forks source link

Data handling restructure #29

Closed aditya0by0 closed 4 months ago

aditya0by0 commented 6 months ago

Goal


A special case for the data splits is the chebi_version_train:

Use case

You want to compare two models trained on different versions of ChEBI. In order to make a fair comparison, you need to evaluate both models on the same test set (and train them on training sets that don't overlap with this test set).

Tasks

Most of the functionality is already implemented for that, it just needs to be adapted to the dynamic data splits. In the end, no new files should be created for specific splits.

sfluegel05 commented 6 months ago

Some additional changes we talked about:

aditya0by0 commented 5 months ago

Hi @sfluegel05, Please review the PR. I request for bit more time to look into tutorials/eval_model_basic.ipynb due to some errors.

aditya0by0 commented 5 months ago

Hi @sfluegel05, Please review the PR. I request for bit more time to look into tutorials/eval_model_basic.ipynb due to some errors.

Hi @sfluegel05, I have made the changes for tutorials/eval_model_basic.ipynb and to other relevant .py files related to it. Please review the PR.

aditya0by0 commented 5 months ago

Also, I have updated the wiki Data-Management/Data folder structure for the new folder structure according to this PR. Please review.

aditya0by0 commented 5 months ago

Also, can you please confirm whether merging this PR will lead to closure of the below issue too

aditya0by0 commented 5 months ago

Hi @sfluegel05, as you have approved the changes. Can you please merge the PR if there are no further actions/changes left for this issue.

sfluegel05 commented 5 months ago

I will merge this PR, but before that, I have another task. Since this change will require other users of this tool to change their datasets, it would be useful to have a migration script. I created an issue for that: #34