>>> import lexnlp.extract.en.geoentities>>> text = "This Contract (“Contract”) is entered into by and between the City of Detroit, a Michigan municipal corporation">>> for geoentity in lexnlp.extract.en.geoentities.get_geoentity_annotations(text, _CONFIG):>>> print(geoentity)Michigan [geoentity] at (86..95), loc: en
Currently the get_geoentity_annotations is returning the wrong location indexes as shown in the example above, the right location indexes should be Michigan [geoentity] at (82..91), loc: en. I noticed that this behavior comes when the text variable contains ponctuations signs, so each time the get_geoentity_annotations parser face a ponctuation sign (eg. ,, (, ), ”, “) the location index is incremented by +2, in this way any geoentity occurs first before any ponctuation signs have got the right location indexes, on the other hand the ones that occur after have got the wrong location indexes.
>>> import lexnlp.extract.en.geoentities
>>> text = "This Contract (“Contract”) is entered into by and between the City of Detroit, a Michigan municipal corporation"
>>> for geoentity in lexnlp.extract.en.geoentities.get_geoentity_annotations(text, _CONFIG):
>>> print(geoentity)
Michigan [geoentity] at (86..95), loc: en
Currently the
get_geoentity_annotations
is returning the wrong location indexes as shown in the example above, the right location indexes should beMichigan [geoentity] at (82..91), loc: en
. I noticed that this behavior comes when the text variable contains ponctuations signs, so each time theget_geoentity_annotations
parser face a ponctuation sign (eg.,
,(
,)
,”
,“
) the location index is incremented by+2
, in this way any geoentity occurs first before any ponctuation signs have got the right location indexes, on the other hand the ones that occur after have got the wrong location indexes.