flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.98k stars 2.1k forks source link

ELMO pubmed model #502

Closed nstfk closed 5 years ago

nstfk commented 5 years ago

Elmo now has a new model trained on the PubMed corpus, check ( https://allennlp.org/elmo under contributed model) Is this supported in Flair the same way the others (original, medium, small and Portuguese) are supported? And is there a possibility to embed sentences using my own pre-trained ELMO/BERT models? Thank you!

stefan-it commented 5 years ago

@nstfk The portugese model is currently supported.

If you want to use the new PubMed model, I'll open a PR to support it :)

nstfk commented 5 years ago

Wow, that was quick And yes please, I am trying to evaluate and compare Flair/ELMO/BERT embedding on medical downstream tasks..

stefan-it commented 5 years ago

PR is created :) Once it is merged, you can use the PubMed model with:

from flair.embeddings import ELMoEmbeddings

embeddings = ELMoEmbeddings('pubmed')

This will download weights and options for the ELMo model.

cpmss521 commented 5 years ago

hai..when i use: embedding = ELMoEmbeddings('pubmed'), embed = embedding.embed(sentence) for token in sentence: print(token) print(token.get_embedding()) print(token.get_embedding().shape) and i get torch.Size([3072]),why is 3072?? @stefan-it ,thanks

stefan-it commented 5 years ago

@cpmss521 By default, ELMo outputs three 1024d vectors for each token. In flair we currently concatenate all three layers :) Btw. here's an issue that discusses that further #647.

cpmss521 commented 5 years ago

thanks !