dmis-lab / biobert

Bioinformatics'2020: BioBERT: a pre-trained biomedical language representation model for biomedical text mining
http://doi.org/10.1093/bioinformatics/btz682
Other
1.93k stars 451 forks source link

How to "re-train" BioBert on a custom medical corpus? #176

Open peterphancong opened 2 years ago

peterphancong commented 2 years ago

Thank you for you wonderful BioBert pretrained model. As a part of my work, I would like to have BioBert model trained in my medical corpus (without labels) before use it for further text embedding. However, all the source code I found mostly require labelled data (finetuning not further pretraining). Could you please introduce a sample for that?

Thank you very much