Closed johann-petrak closed 3 years ago
Copied traceback info:
IndexError Traceback (most recent call last)
in
8 doc2 = Annie(doc1)
9 properdoc = ProperDoc(doc1)
---> 10 gazdoc = GazDet(properdoc)
11 for ann in gazdoc.annset("Resume"):
12 doc2.annset("Resume").add_ann(ann)
in GazDet(doc)
5 for typ in details:
6 tgaz = TokenGazetteer("data/" + typ + ".def", fmt="gate-def", annset="", outset="Resume", outtype=typ)
----> 7 gazdoc = tgaz(doc)
8 return gazdoc
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in call(self, doc, annset, tokentype, septype, splittype, withintype, all, skip)
697 for segment_start, segment_end in segment_offs:
698 tokens = list(anns.within(segment_start, segment_end))
--> 699 for matches in self.find_all(tokens, doc=doc):
700 for match in matches:
701 starttoken = tokens[match.start]
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in find_all(self, tokens, doc, all, skip, fromidx, toidx, endidx, matchfunc)
617 idx = fromidx
618 while idx <= toidx:
--> 619 matches, maxlen, idx = self.find(
620 tokens,
621 doc=doc,
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in find(self, tokens, doc, all, fromidx, toidx, endidx, matchfunc)
550 endidx = len(tokens)
551 while idx <= toidx:
--> 552 matches, long = self.match(
553 tokens, idx=idx, doc=doc, all=all, endidx=endidx, matchfunc=matchfunc
554 )
~\miniconda3\lib\site-packages\gatenlp\processing\gazetteer.py in match(self, tokens, doc, all, idx, endidx, matchfunc)
454 while j <= endidx:
455 if node.nodes:
--> 456 token = tokens[j]
457 if token.type == self.splittype:
458 break
IndexError: list index out of range
@mdorkhah would you be able to (privately) share a minimal test case?
@mdorkhah would you be able to (privately) share a minimal test case?
Sure, I just sent you an email.
Thanks - I was not able to get that running yet, but I think I have actually found the bug already! :)
To test this would you be able to install gatenlp from the very latest version of the github main branch?
One way to do this would be:
pip install -U git+https://github.com/GateNLP/python-gatenlp.git[EXTRAS]
where EXTRAS is the list of extras you need alsoThanks - I was not able to get that running yet, but I think I have actually found the bug already! :)
To test this would you be able to install gatenlp from the very latest version of the github main branch?
One way to do this would be:
- maybe create a separate environment for this and change into it
- install gatenlp from latest github main branch:
pip install -U git+https://github.com/GateNLP/python-gatenlp.git[EXTRAS]
where EXTRAS is the list of extras you need also- NOTE: this gatenlp version requires the recent new version 3.0.4 of the GATE Python plugin for the GateWorker which should get used automatically.
Works! Thank you again...
Thanks for testing! Closing
See https://github.com/GateNLP/python-gatenlp/discussions/92