Tokens not aligned with Ontonotes Tokens

CogComp / cogcomp-nlpy

CogComp's light-weight Python NLP annotators

http://nlp.cogcomp.org/

Other

116 stars 26 forks source link

Tokens not aligned with Ontonotes Tokens #95

Closed sanjayss34 closed 6 years ago

sanjayss34 commented 6 years ago

When I create a TextAnnotation for some text, the resulting tokens are not in the Ontonotes format. For instance, if I make a TextAnnotation for "Bin Laden 's", the tokens are ["Bin", "Laden", "'", "s"]. This is problematic, for instance, when I'm trying to compare the NER results that I get from the system with the gold results. Is there a way in which I can specify the list of tokens as input rather than the full text string?

nitishgupta commented 6 years ago

I think the solution proposed for this will still not work. I recently added functionality to work with pre-tokenized text. It only works with local pipeline. The doc() function now takes pretokenized=True as an argument. This solves this issue. Read the README for the update.