Closed StanSilas closed 7 years ago
Pre-trained models also take the information whether a word is capitalized or not into the context. So using pre-trained models might not produce what you would expect. One possible strategy is that you might want to train a tagging model on an annotated corpus where all words are lower-cased.
Thanks for this awesome project.
My project background : step 1) Audio -> Text step 2) Text -> Gather POS tags + Identify Named Entities from text. In the speech to text step, all of the text is more or less getting converted to lower case.
I'd like to know how to make the model identify/classify words based on the context and not on the whether the word is capitalized or not.
If the sentence is " I ate an apple while sitting inside the apple headquarters"
I want the RDRPOS tagger to identify first apple as a fruit and the second apple as an organization. At present it is identifying the Apple as an organisation only when it is capitalized. The RDRPOSTagger.py is able to identify headquarters as an NNP, but RDRPOSTagger4en.py is identifying it as a NN.
Thanks.