dmis-lab / biobert

Bioinformatics'2020: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
http://doi.org/10.1093/bioinformatics/btz682
Other
1.93k stars 451 forks source link

RE finetune using custom dataset, taskname? #149

Open ChloeJKim opened 3 years ago

ChloeJKim commented 3 years ago

Hi,

I want to run relation extraction model using my own dataset. To fine-tune the model, it requires me to specify the task_name to either gad or euadr. If I'm running the model with my own dataset, do I need to specify this task_name? or can I opt out this, if I can where in the code should I edit?

Thank you!

wonjininfo commented 3 years ago

Hi Chloe, Yes, you need to input task_name. If your dataset is a task of binary classification, you can use either of them. Basically, euadr and gad are processed in the same way (using BioBERTProcessor). https://github.com/dmis-lab/biobert/blob/37599fb978e3b584a6e9aa9abca1f38588bfff4f/run_re.py#L914-L917

Please be noticed that, however, chemprot dataset is a multi-class classification task. Hence it is processed in a different way and the same holds for the evaluation script.
Thank you for your interest in our work! Best, WonJin

ChloeJKim commented 3 years ago

I see, my dataset is binary and the reason I asked is that I get a different evaluation for gad, euadr in terms of specificity. I've used the same dataset. [EUADR] image [GAD] image

Adair0319 commented 11 months ago

Hi Chloe,I seem to have encountered the same problem, have you found the cause?