aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.31k stars 337 forks source link

Polyglot NER - Modification to improve accuracy. #95

Open LSMatos opened 7 years ago

LSMatos commented 7 years ago

Hi, I'm doing a crime study and I'm using your library. I'm from Brazil and I'm using LANG: pt to work with my texts, there's only one problem: Many streets in Brazil have names of people, usually some former president, such as "Getúlio Vargas Avenue".

In this case Getúlio Vargas is to be recognized as LOC and not PER. My question is, is there any way I can indicate that a name coming after certain words that represent places can be recognized as LOC and not PER? Due to the problem of names I have said before, it is common in the country to use certain words like "Bairro, Avenida, Conjunto" to identify a locality when it has a real person's name.

I'm running Polyglot on Ubuntu 16.04 LTS Python 2.7