Closed o-turbitt-mdc closed 10 months ago
Thanks for the comments! It might be a typo in the paper. I will double-check the statistics and update the paper soon. Thanks for the catch again!
Is there any update on this? It's important to the comparison.
Hi Zhihan,
Thank you very much for your hard work on this repo, much appreciated!
I have been trying to recreate the results of the DNABERT2 on the COVID variant prediction task from the GUE benchmark and I have noticed a discrepancy between the reported values in the paper for the train, validation, and test splits.
In the paper, it is reported that the breakdown for these splits is: 77669 / 7000 / 7000. However, using the files provided here, I get a breakdown for the splits as: 73335 / 9168 / 9168.
I haven't checked any of the other benchmark datasets but this may be an issue with others too.
Is it possible for you to fix the datasets provided to the correct splits given in the paper? This will allow others to recreate and validate your results.