Closed aditya0by0 closed 4 months ago
Some additional changes we talked about:
load_processed_data
instead of dataloader
to avoid code duplicationtutorials/eval_model_basic.ipnb
still works, update if necessaryHi @sfluegel05, Please review the PR.
I request for bit more time to look into tutorials/eval_model_basic.ipynb
due to some errors.
Hi @sfluegel05, Please review the PR. I request for bit more time to look into
tutorials/eval_model_basic.ipynb
due to some errors.
Hi @sfluegel05, I have made the changes for tutorials/eval_model_basic.ipynb
and to other relevant .py
files related to it.
Please review the PR.
Also, I have updated the wiki Data-Management/Data folder structure
for the new folder structure according to this PR.
Please review.
Also, can you please confirm whether merging this PR will lead to closure of the below issue too
Hi @sfluegel05, as you have approved the changes. Can you please merge the PR if there are no further actions/changes left for this issue.
I will merge this PR, but before that, I have another task. Since this change will require other users of this tool to change their datasets, it would be useful to have a migration script. I created an issue for that: #34
Goal
chebi.obo
(raw)data/ChEBIX/chebi_version/raw
/data/ChEBIX/chebi_version/processed/encoding
data/chebi_version/raw
data/chebi_version/ChEBIX/processed
data/chebi_version/ChEBIX/processed/encoding
A special case for the data splits is the
chebi_version_train
:Use case
You want to compare two models trained on different versions of ChEBI. In order to make a fair comparison, you need to evaluate both models on the same test set (and train them on training sets that don't overlap with this test set).
Tasks
chebi_version_train
is set, create and process two datasets (one for thechebi_version
, one forchebi_version_train
)chebi_version_train
data, but using the test set fromchebi_version
chebi_version
test set that has all the same entries, but only the labels that also appear in theclasses.txt
ofchebi_version_train
ChEBIOver50(chebi_version=231)
andChEBIOver50(chebi_version=231, chebi_version_train=200)
should have the same ids in their test sets (but different numbers of labels), the latter should also pass the test for no overlapsMost of the functionality is already implemented for that, it just needs to be adapted to the dynamic data splits. In the end, no new files should be created for specific splits.