dainlp / acl2020-transition-discontinuous-ner

65 stars 9 forks source link

Which pre-trained word embeddings and ELMo checkpoint are used? #2

Closed LorrinWWW closed 4 years ago

LorrinWWW commented 4 years ago

Thanks for sharing the code. I am wondering which pre-trained features are used. Since these corpora are in Bio/Medical area, word embeddings pre-trained on PubMed abstracts are very useful in our preliminary experiments. So is ELMo, which also has a 'pubmed' checkpoint, significantly outperforming the original ones in ShARe.

dainlp commented 4 years ago

The results reported in this paper are based on those publicly available generic models: glove pretrained on wikipedia (6B version from stanford) and elmo pretrained on wiki and news (5.5B version from AllenNLP).

In terms of using biomedical data to pretrain elmo or word vectors, our NAACL19 paper reported some results regarding CADEC dataset: https://www.aclweb.org/anthology/N19-1149.pdf

We found CADEC, which contains forum posts about medication, is actually more similar to online reviews than PubMed.

We didn't do hyper-parameter search on ShARe. We used everything based on experiments on CADEC and found these settings work well as well on ShARe. Surely I believe word vectors or elmo pretrained on in-domain data, such as MIMIC, can further improve the results on ShARe.

LorrinWWW commented 4 years ago

Thanks for the quick reply and clear explanation!