flairNLP / flair

A very simple framework for state-of-the-art Natural Language Processing (NLP)
https://flairnlp.github.io/flair/
Other
13.88k stars 2.1k forks source link

Different dimensionality for the contextual word embedding #111

Closed dongfang91 closed 5 years ago

dongfang91 commented 6 years ago

Hello,

The dimensionality of the language model you use in LSTM is 2048, that means 4096 for a single word contextual embeddings, have you tried the different dimensionalities? And do you have the performances in both language model task and NER task?

Thanks!

alanakbik commented 6 years ago

Hello @dongfang91 - yes, we've tried two different language models:

We distribute the trained models for both LMs and report all numbers. For the big LM, the evaluation numbers are here. For the small LM, the evaluation numbers are here (the small LM is used to train the "Fast English Models").

As you can see from these tables, the smaller LM comes reasonably close to the big LM. On CoNLL-03 for instance, we get 93.24 vs. 92.61 for big vs. small, and on Ontonotes NER, we get 89.52 vs. 89.28 for big vs. small.

Hope this helps!

dongfang91 commented 6 years ago

Thanks for your answers! The large dimensionality really helps a lot! Have you tried 300, 500 before? Would that decrease the performance a lot?

alanakbik commented 6 years ago

We experimented only very little with smaller LMs. Maybe a character LM with 512 hidden states (and maybe more than 1 layer) could work well, but I am not sure. Are you experimenting with smaller LMs? Do you have results? Would be interesting for us!

dongfang91 commented 6 years ago

Thanks for your response! We are trying to train your LMs on Clinical dataset, and we would probably try a few smaller dimensionalities.

We tried your contextual string embeddings for each character, it seems to work pretty well!