ahalterman / mordecai3

Full text geoparsing/toponym resolution with event geolocation
MIT License
71 stars 16 forks source link

Location entities - `"FAC"` and `"NORP"` #18

Closed DanShatford closed 5 months ago

DanShatford commented 1 year ago

In the geoparse.doc_to_ex_expanded function, there are two different lists of labels for entities:

['GPE', 'LOC', 'EVENT_LOC', 'NORP']

and

['GPE', 'LOC', 'EVENT_LOC', 'FAC']

Is this intentional?

https://github.com/ahalterman/mordecai3/blob/main/mordecai3/geoparse.py#L118-L120

ahalterman commented 5 months ago

Thanks for raising that question. I had to remember why I did it. That first list we just use for context when picking the best location. For that purpose, NORPs are very useful ("The city of Tripoli on the Lebanese coast") but we don't want to geoparse them. Conversely, FACs aren't as helpful for context, but we do want to geoparse them. I added a comment to the code because it definitely does look like an inconsistency.