dmis-lab / biobert-pytorch

PyTorch Implementation of BioBERT
http://doi.org/10.1093/bioinformatics/btz682
Other
310 stars 107 forks source link

train_dev.tsv in datasets #38

Open jinniulema opened 1 year ago

jinniulema commented 1 year ago

Why do we have 4 splits train.tsv, train_dev.tsv, devel.tsv, test.tsv? And is the train_dev.tsv merged by train.tsv and devel.tsv? And in the run_ner.py for example, why the train_dataset is transformed from train_dev.txt and eval_dataset is transformed from eval.txt? Thanks.

aclarkse commented 3 months ago

Hello, I have the same question. I am curating my own dataset for training, and I was wondering if the train_dev files are merged versions of the train and devel splits.