Closed dongfang91 closed 6 years ago
Hello @dongfang91 - yes, we've tried two different language models:
We distribute the trained models for both LMs and report all numbers. For the big LM, the evaluation numbers are here. For the small LM, the evaluation numbers are here (the small LM is used to train the "Fast English Models").
As you can see from these tables, the smaller LM comes reasonably close to the big LM. On CoNLL-03 for instance, we get 93.24 vs. 92.61 for big vs. small, and on Ontonotes NER, we get 89.52 vs. 89.28 for big vs. small.
Hope this helps!
Thanks for your answers! The large dimensionality really helps a lot! Have you tried 300, 500 before? Would that decrease the performance a lot?
We experimented only very little with smaller LMs. Maybe a character LM with 512 hidden states (and maybe more than 1 layer) could work well, but I am not sure. Are you experimenting with smaller LMs? Do you have results? Would be interesting for us!
Thanks for your response! We are trying to train your LMs on Clinical dataset, and we would probably try a few smaller dimensionalities.
We tried your contextual string embeddings for each character, it seems to work pretty well!
Hello,
The dimensionality of the language model you use in LSTM is 2048, that means 4096 for a single word contextual embeddings, have you tried the different dimensionalities? And do you have the performances in both language model task and NER task?
Thanks!