GateNLP / python-gatenlp

Python text processing, pattern matching, and NLP framework
https://gatenlp.github.io/python-gatenlp/
Apache License 2.0
63 stars 8 forks source link

Index error when running TokenGazetteer #93

Closed johann-petrak closed 3 years ago

johann-petrak commented 3 years ago

See https://github.com/GateNLP/python-gatenlp/discussions/92

johann-petrak commented 3 years ago

Copied traceback info:

IndexError Traceback (most recent call last)
in
8 doc2 = Annie(doc1)
9 properdoc = ProperDoc(doc1)
---> 10 gazdoc = GazDet(properdoc)
11 for ann in gazdoc.annset("Resume"):
12 doc2.annset("Resume").add_ann(ann)

in GazDet(doc)
5 for typ in details:
6 tgaz = TokenGazetteer("data/" + typ + ".def", fmt="gate-def", annset="", outset="Resume", outtype=typ)
----> 7 gazdoc = tgaz(doc)
8 return gazdoc

~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in call(self, doc, annset, tokentype, septype, splittype, withintype, all, skip)
697 for segment_start, segment_end in segment_offs:
698 tokens = list(anns.within(segment_start, segment_end))
--> 699 for matches in self.find_all(tokens, doc=doc):
700 for match in matches:
701 starttoken = tokens[match.start]

~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in find_all(self, tokens, doc, all, skip, fromidx, toidx, endidx, matchfunc)
617 idx = fromidx
618 while idx <= toidx:
--> 619 matches, maxlen, idx = self.find(
620 tokens,
621 doc=doc,

~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in find(self, tokens, doc, all, fromidx, toidx, endidx, matchfunc)
550 endidx = len(tokens)
551 while idx <= toidx:
--> 552 matches, long = self.match(
553 tokens, idx=idx, doc=doc, all=all, endidx=endidx, matchfunc=matchfunc
554 )

~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in match(self, tokens, doc, all, idx, endidx, matchfunc)
454 while j <= endidx:
455 if node.nodes:
--> 456 token = tokens[j]
457 if token.type == self.splittype:
458 break

IndexError: list index out of range
johann-petrak commented 3 years ago

@mdorkhah would you be able to (privately) share a minimal test case?

mdorkhah commented 3 years ago

@mdorkhah would you be able to (privately) share a minimal test case?

Sure, I just sent you an email.

johann-petrak commented 3 years ago

Thanks - I was not able to get that running yet, but I think I have actually found the bug already! :)

To test this would you be able to install gatenlp from the very latest version of the github main branch?

One way to do this would be:

mdorkhah commented 3 years ago

Thanks - I was not able to get that running yet, but I think I have actually found the bug already! :)

To test this would you be able to install gatenlp from the very latest version of the github main branch?

One way to do this would be:

  • maybe create a separate environment for this and change into it
  • install gatenlp from latest github main branch: pip install -U git+https://github.com/GateNLP/python-gatenlp.git[EXTRAS] where EXTRAS is the list of extras you need also
  • NOTE: this gatenlp version requires the recent new version 3.0.4 of the GATE Python plugin for the GateWorker which should get used automatically.

Works! Thank you again...

johann-petrak commented 3 years ago

Thanks for testing! Closing