LexPredict / lexpredict-lexnlp

LexNLP by LexPredict
GNU Affero General Public License v3.0
690 stars 175 forks source link

lexnlp.extract.en.geoentities.get_geoentity_annotations returning the wrong location indexes #40

Open Ra-you opened 4 years ago

Ra-you commented 4 years ago

>>> import lexnlp.extract.en.geoentities >>> text = "This Contract (“Contract”) is entered into by and between the City of Detroit, a Michigan municipal corporation" >>> for geoentity in lexnlp.extract.en.geoentities.get_geoentity_annotations(text, _CONFIG): >>> print(geoentity) Michigan [geoentity] at (86..95), loc: en

Currently the get_geoentity_annotations is returning the wrong location indexes as shown in the example above, the right location indexes should be Michigan [geoentity] at (82..91), loc: en. I noticed that this behavior comes when the text variable contains ponctuations signs, so each time the get_geoentity_annotations parser face a ponctuation sign (eg. ,, (, ), , ) the location index is incremented by +2, in this way any geoentity occurs first before any ponctuation signs have got the right location indexes, on the other hand the ones that occur after have got the wrong location indexes.