gandersen101 / spaczz

Fuzzy matching and more functionality for spaCy.
MIT License
252 stars 27 forks source link

BugFix for german Combination words for RegexSearcher #66

Closed JonasHablitzel closed 2 years ago

JonasHablitzel commented 2 years ago

Hello,

i encountered a bug when searching for a substring inside a string. When the subword is at the end of the word, it wasn"t found. e.g.

text =  "We want to identify a geman word combination Aussagekraft"
doc = nlp(text)
search = RegexSearcher(nlp.vocab)
matches = search.match(doc,r'(kraft|Kraft)') # matches  = []

The change should fix the Bug and i also added a Testcase.