Closed nicksunderland closed 1 year ago
Please give all details about your system and software used: Operating System: MAC Python Version: 3.11 How was gatenlp installed: version 1.0.9dev0
Describe the bug I can't seem to match on an annotation's text, using the 'text=' parameter.
To Reproduce
from nltk.tokenize.regexp import WhitespaceTokenizer from gatenlp import Document from gatenlp.processing.tokenizer import NLTKTokenizer from gatenlp.pam.pampac import * # Token text text = """foo bar baz""" doc1 = Document(text) tok1 = NLTKTokenizer(nltk_tokenizer=WhitespaceTokenizer()) doc1 = tok1(doc1) print("---------") for ann in doc1.annset(): print(doc1[ann].ljust(4, " ") + " - " + str(ann)) # Add annotation by document text pat1 = Text(text="foo") act1 = AddAnn(type="FOUND_FOO_IN_TEXT") rule = Rule(pat1, act1) pamp = Pampac(rule, skip="longest", select="first") annt = PampacAnnotator(pamp, annspec=[("", "Token")], outset_name="") annt(doc1) print("----As expected using Text()-----") for ann in doc1.annset(""): print(doc1[ann] + " - " + str(ann)) # Add annotation by annotation text text = """foo bar baz""" doc1 = Document(text) tok1 = NLTKTokenizer(nltk_tokenizer=WhitespaceTokenizer()) doc1 = tok1(doc1) pat1 = AnnAt(type="Token", text="foo") act1 = AddAnn(type="FOUND_FOO_IN_ANN_TEXT") rule = Rule(pat1, act1) pamp = Pampac(rule, skip="longest", select="first") annt = PampacAnnotator(pamp, annspec=[("", "Token")], outset_name="") annt(doc1) print("----Not what I expected using AnnAt(text=), tags all tokens-----") for ann in doc1.annset(""): print(doc1[ann] + " - " + str(ann))
Output:
--------- foo - Annotation(0,3,Token,features=Features({}),id=0) bar - Annotation(4,7,Token,features=Features({}),id=1) baz - Annotation(8,11,Token,features=Features({}),id=2) ----As expected using Text()----- foo - Annotation(0,3,Token,features=Features({}),id=0) foo - Annotation(0,3,FOUND_FOO_IN_TEXT,features=Features({}),id=3) bar - Annotation(4,7,Token,features=Features({}),id=1) baz - Annotation(8,11,Token,features=Features({}),id=2) ----Not what I expected using AnnAt(text=), tags all tokens----- foo - Annotation(0,3,Token,features=Features({}),id=0) foo - Annotation(0,3,FOUND_FOO_IN_ANN_TEXT,features=Features({}),id=3) bar - Annotation(4,7,Token,features=Features({}),id=1) bar - Annotation(4,7,FOUND_FOO_IN_ANN_TEXT,features=Features({}),id=4) baz - Annotation(8,11,Token,features=Features({}),id=2) baz - Annotation(8,11,FOUND_FOO_IN_ANN_TEXT,features=Features({}),id=5)
Thank you for reporting and providing the code to reproduce!
Not problem, thanks for the fix.
Please give all details about your system and software used: Operating System: MAC Python Version: 3.11 How was gatenlp installed: version 1.0.9dev0
Describe the bug I can't seem to match on an annotation's text, using the 'text=' parameter.
To Reproduce
Output: