dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

Add PTB3 escaping also in the StanfordNamedEntityRecognizer #394

Closed reckart closed 9 years ago

reckart commented 9 years ago
The parser and pos tagger component already uses the ptb3 escaper to sanitize e.g. (
or ) tokens before they get to the parser. But the StanfordPosTagger not yet - should
be largely a copy-paste from the StanfordParser component - however, the named entities
are currently detected over the document string, not over the tokens. We would need
to change it so that it operates on the tokens, which is only possible of offsets are
preserved by the Stanford classifier code. 

Original issue reported on code.google.com by richard.eckart on 2014-05-16 09:45:19

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2014-10-11 19:09:19