Closed kermitt2 closed 6 years ago
Thanks for posting this with the excellent details! This is quite interesting. I haven't noticed any strange effects with numbers, but haven't looked at them in detail. We aren't doing any special tokenization or normalization of numbers when training the model, and they are treated the same as all other tokens. When using datasets like Ontonotes we also just use the existing, provided tokenization.
Two questions:
Hi @kermitt2 -- just to follow up, we aren't able to reproduce these results on our end, and we are seeing improved performance with ELMo for all entity types in this dataset (including ORDINAL, etc). Perhaps it's something particular to how you are handling numbers vs strings in your pre-processing pipeline?
Hello @matt-peters Sorry for the late reply. The follow-up is super useful and I will revisit and double check my pre-processing given that it is thus coming from my side. The original ELMo model gave similar results for me. Many thanks!
Just to close the loop on this, we saw a 0.882 development set F1 using the 5.5B ELMo model for this dataset (haven't checked the test set performance but it should be similar).
Hey can you tell me where i can download the pretrained elmo ner ontonotes model. Thanks
Thanks a lot for this work and making it available!
I used ELMo contextualized embeddings in my Keras framework (DeLFT) and I could reproduce the excellent results for CoNLL 2003 NER task - actually slightly better than what you reported in your NAACL 2018 paper (92.47 averaged over 10 training, using the 5.5B ELMo model, warm-up, concatenation with Glove embeddings with a Lample 2016 BiLSTM-CRF architecture).
However when using ELMo embeddings with NER Ontonotes CoNLL-2012 dataset, I have a large drop of -5.0 points for f-score as compared to Glove only. The drop is the same when using ELMo only or ELMo embeddings concatenated with Glove.
Here is the evaluation with Glove without ELMo:
And here are the results with ELMo:
I see that the drop is always for named entity classes related somehow to numbers (ORDINAL -65, CARDINAL -58, QUANTITY -53, DATE -18, etc.), and the recognition of all the other classes are actually improving with ELMo.
I am wondering what could cause this behavior (apart an implementation error from me), did you observe something similar?
Are you using special normalization of numbers on the corpus before training the BiLM? I am using the default tokenization of Onotnotes/CoNLL-2012, should I use maybe another particular tokenization?