TheScienceMuseum / elastic-wikidata

CLI for loading Wikidata subsets (or all of it) into Elasticsearch
https://www.sciencemuseumgroup.org.uk/project/heritage-connector/
MIT License
67 stars 7 forks source link

Handle globe coordinates #21

Closed kinoute closed 2 years ago

kinoute commented 2 years ago

Hello,

While trying to import elements with P31 and P625 from the whole dump, we found out that P31 was correctly imported in ES but P625 values were not imported: it results in a blank list in ES.

While investigating both dump_to_es.py and wq_entities.py, it looks like the globe coordinates (P625) are not handled properly and raise an invisible KeyError here:

https://github.com/TheScienceMuseum/elastic-wikidata/blob/7447cd4b77384a8a8025f4963953fd3b132b97e3/elastic_wikidata/wd_entities.py#L180-L185

Here is a sample of a P625 entity inside the Wikidata dump:

{
    "mainsnak": {
        "snaktype": "value",
        "property": "P625",
        "datavalue": {
            "value": {
                "latitude": 27.37833,
                "longitude": 102.54149,
                "altitude": None,
                "precision": None,
                "globe": "http://www.wikidata.org/entity/Q2"
            },
            "type": "globecoordinate"
        },
        "datatype": "globe-coordinate"
    },
    "type": "statement",
    "id": "q1200311$319D21B7-8D64-4D12-BF46-837158D823AA",
    "rank": "normal"
}

We can see that the type is "globecoordinate". But in wd_entities.py, we can see that wd_type_mapping doesn't have a mapping for that type:

https://github.com/TheScienceMuseum/elastic-wikidata/blob/7447cd4b77384a8a8025f4963953fd3b132b97e3/elastic_wikidata/wd_entities.py#L136-L141

Therefore, it is not treated. To fix that for our project, we added a condition for this case available in this PR but of course, a better solution might be integrated.

Regards

jamieu commented 2 years ago

Thanks Kinoute,

I don't think we ever used the coordinates so that was probably why we never spotted it.

Have merged your PR in, thanks for raising it, hopefully your fix will prove useful to others as well down the line.