Closed sanjayss34 closed 6 years ago
I think the solution proposed for this will still not work.
I recently added functionality to work with pre-tokenized text.
It only works with local pipeline. The doc() function now takes pretokenized=True
as an argument. This solves this issue.
Read the README for the update.
When I create a TextAnnotation for some text, the resulting tokens are not in the Ontonotes format. For instance, if I make a TextAnnotation for "Bin Laden 's", the tokens are ["Bin", "Laden", "'", "s"]. This is problematic, for instance, when I'm trying to compare the NER results that I get from the system with the gold results. Is there a way in which I can specify the list of tokens as input rather than the full text string?