Closed LorrinWWW closed 4 years ago
The results reported in this paper are based on those publicly available generic models: glove pretrained on wikipedia (6B version from stanford) and elmo pretrained on wiki and news (5.5B version from AllenNLP).
In terms of using biomedical data to pretrain elmo or word vectors, our NAACL19 paper reported some results regarding CADEC dataset: https://www.aclweb.org/anthology/N19-1149.pdf
We found CADEC, which contains forum posts about medication, is actually more similar to online reviews than PubMed.
We didn't do hyper-parameter search on ShARe. We used everything based on experiments on CADEC and found these settings work well as well on ShARe. Surely I believe word vectors or elmo pretrained on in-domain data, such as MIMIC, can further improve the results on ShARe.
Thanks for the quick reply and clear explanation!
Thanks for sharing the code. I am wondering which pre-trained features are used. Since these corpora are in Bio/Medical area, word embeddings pre-trained on PubMed abstracts are very useful in our preliminary experiments. So is ELMo, which also has a 'pubmed' checkpoint, significantly outperforming the original ones in ShARe.