clulab / eidos

Machine reading system for World Modelers
Apache License 2.0
36 stars 24 forks source link

allow GADM IDs for geographical locations #688

Open bethard opened 4 years ago

bethard commented 4 years ago

We need to allow GADM IDs (e.g., ETH.9.7.3_1) in addition to GeoNames IDs (e.g., 6783700) for normalizing locations. The geonorm library already supports this. (No changes were needed.) I have constructed a new geonorm index that includes all of GeoNames, plus some Ethiopian woredas from GADM that weren't in GeoNames:

http://clulab.cs.arizona.edu/models/geonames+woredas.zip

To load this new index, some changes to Eidos are needed. Eidos assumes that geo IDs will be Ints:

https://github.com/clulab/eidos/blob/master/src/main/scala/org/clulab/wm/eidos/context/GeoNormFinder.scala#L23 https://github.com/clulab/eidos/blob/master/src/main/scala/org/clulab/wm/eidos/context/GeoNormFinder.scala#L137 etc.

Those Ints will need to be changed to Strings to work with the new index.

As far as I know, there was no reason to use Ints in the first place, so while it's a pain to change a bunch of types in Eidos, I think a String is a better representation for an ID anyway.

@EgoLaparra: can you look into this today? I believe @kwalcock wants all changes in today so that he can merge them, run Eidos on the documents over the weekend, and send Ben the new results on Monday.

kwalcock commented 4 years ago

Thanks again for the heads up. It probably won't be starting until tomorrow. There are still documents coming in and ontologies being changed.

EgoLaparra commented 4 years ago

Yes, I start working on this right now.