everypolitician / wikidata-area

Fetch information on constituencies / areas from Wikidata
MIT License
0 stars 0 forks source link

If multiple GSS, only gets one #1

Open dracos opened 6 years ago

dracos commented 6 years ago

e.g. https://www.wikidata.org/wiki/Q3138286 has two GSS identifiers, one obsolete, one new. Looks like Wikidata keeps the same ID if name stays the same – the two boundaries are https://mapit.mysociety.org/area/13122.html E14000189 pre-2010 and https://mapit.mysociety.org/area/65550.html E14000720 post-2010. EP is getting the old GSS id but putting it with the current constituency which makes matching with MapIt quite hard.

tmtmtmtm commented 6 years ago

This is a problem with the new identifiers in Wikidata not being marked as 'preferred'. It looks like there are quite a lot of constituencies with duplicate IDs like this:

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q27971968 ; wdt:P836 ?id1 ; wdt:P836 ?id2 FILTER (?id2 > ?id1) .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

(473 at time of writing)

I've corrected this one, but doing them all by hand will take a while.

We could potentially adjust our query to ignore claims with end dates, and it may be pragmatic to do so as well as cleaning up the data, but really this should be fixed in Wikidata, as lots of other queries will also Do The Wrong Thing with this data.

I don't think PositionStatements can currently set the Ranking of a claim, but this might be a useful time to add that functionality.