lexnlp.extract.en.geoentities.get_geoentity_annotations returning the wrong location indexes

>>> import lexnlp.extract.en.geoentities >>> text = "This Contract (“Contract”) is entered into by and between the City of Detroit, a Michigan municipal corporation" >>> for geoentity in lexnlp.extract.en.geoentities.get_geoentity_annotations(text, _CONFIG): >>> print(geoentity) Michigan [geoentity] at (86..95), loc: en

Currently the get_geoentity_annotations is returning the wrong location indexes as shown in the example above, the right location indexes should be Michigan [geoentity] at (82..91), loc: en. I noticed that this behavior comes when the text variable contains ponctuations signs, so each time the get_geoentity_annotations parser face a ponctuation sign (eg. ,, (, ), ”, “) the location index is incremented by +2, in this way any geoentity occurs first before any ponctuation signs have got the right location indexes, on the other hand the ones that occur after have got the wrong location indexes.

LexPredict / lexpredict-lexnlp

lexnlp.extract.en.geoentities.get_geoentity_annotations returning the wrong location indexes #40