Open vlejd opened 7 years ago
Any progress on this issue? I'm also looking for a correct way of retrieving the start end stop index of an entity within the original "raw" string. Any suggestions how to do that?
Think I have found an answer (maybe not the best one but it works now)
ptext1 = Text(text1)
prevIndex = 0
for sent in ptext1.sentences:
for entity in sent.entities:
print(entity.tag, entity, entity.start, entity.end)
currentIndex = ptext1.index(entity[0], prevIndex)
print('startindex={}, endindex={}'.format(currentIndex, currentIndex+len(entity[0])))
prevIndex = currentIndex+len(entity[0])
This will provide the start index and end index of an entity within the original string.
I want to highlight all entities in raw text. I Use
This correctly finds John Smith, but
smith.start
is set to 11. How am I supposed to translate it into a position in original text? For nicer texts it is an index of token with mention. For texts with other non letter characters it is something strange.Maybe change it to index of first character in original sentence.