MAGICS-LAB / DNABERT_2

[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
Apache License 2.0
245 stars 56 forks source link

Fine-tuning process #43

Open smruti241 opened 1 year ago

smruti241 commented 1 year ago

Hi @Zhihan1996 , thanks for providing the code for finetuning DNABERT2. But there is no mention of how to generate the dev.tsv, test.csv and train.csv from our own dataset and how to provide the label 1 and 0 to the sequences. can you please let me know how to do that?

TheRainInSpain commented 1 year ago

I generated the csv files using the python csv library by writing each sequence and label into one row. But when I ran the code, error happened (described in #42 ). I wonder whether this is the right way to generate the csv files.

smruti241 commented 1 year ago

but how did you label the sequence, based on which parameter? or randomly assigning 1 or 0? because randomly assigning wont make any sense.

smruti241 commented 1 year ago

42

smruti241 commented 1 year ago

I have covid data and DNABERT-2 has already covid data for finetuning. I just want to know how did you know each sequence should be given 1,2,3,4 , etc upto 9 as label? Please let me know @Zhihan1996