Reproducibility: Datasets required

JudeWells / chainsaw

MIT License

27 stars 2 forks source link

Reproducibility: Datasets required #33

Closed FoxHarley closed 2 months ago

FoxHarley commented 7 months ago

Hello!

Great work on this project!

In order to reproduce the performance evaluation, I would like to know which datasets were used for training and validating your approach. While the methodology in your paper is quite thorough, it does not offer a deterministic approach to generate these, and I can't find anything in your repo either.

Could you please either publish the identifiers of your data splits or provide some other means to reproduce the datasets?

Best regards, Nicole

KYQiu21 commented 7 months ago

Same here. And it would be great to have a sample training script as well :)

FoxHarley commented 6 months ago

Hey there! Any update?

JudeWells commented 6 months ago

We are currently benchmarking against other models which will require retraining our model - so will be submitting the training and validation datasets in the next couple of weeks.

JudeWells commented 5 months ago

The train / validation / test splits are now uploaded in thedata_and_benchmarking folder. The preprint has also been updated to show results on the new benchmarking dataset.

KYQiu21 commented 5 months ago

Thanks very much for your efforts 👍