everypolitician / compare_with_wikidata

Library for diffing Wikidata and CSVs
MIT License
2 stars 0 forks source link

Example case: US Senate #15

Open tmtmtmtm opened 7 years ago

tmtmtmtm commented 7 years ago

We want to compare the data for current US Senators:

Wikidata:

SELECT DISTINCT ?item ?itemLabel ?startDate ?endDate ?identifier
WHERE {
  ?item p:P39 ?mem .
  ?mem ps:P39 wd:Q13217683 .
  OPTIONAL { ?item wdt:P1157 ?identifier }
  OPTIONAL { ?mem pq:P580 ?startDate }
  OPTIONAL { ?mem pq:P582 ?endDate }
  FILTER (
    (BOUND(?startDate) && (?startDate >= "2017-01-03T00:00:00Z"^^xsd:dateTime)) ||
    (BOUND(?endDate) && (?endDate >= "2017-01-03T00:00:00Z"^^xsd:dateTime)) ||
    (BOUND(?startDate) && !BOUND(?endDate))
  ).
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

vs the scraper at https://morph.io/tmtmtmtm/us-congress-members (where term = 115)

In this case we can reconcile the two lists using the SPARQL ?identifier field above (via the "US Congress Bio ID" property) which should be the same as the identifier__bioguide field in Morph (NB: Morph also has a Wikidata field, but for the sake of this test case, we should pretend that it doesn't, as this is an example of matching on the legislature having a property field in Wikidata)

lucychambers commented 7 years ago

This example will have full memberships. i.e. people will not disappear from the lists, they will just receive an end date.

TODO during implementation: Decide whether we need to further split this ticket up.

lucychambers commented 7 years ago

Lucy and Tony to set this up as our first example.

chrismytton commented 7 years ago

This is now working, yay! https://www.wikidata.org/wiki/User:Chris_Mytton/sandbox/daff_us_senate

I'm going to hold off closing this ticket though until we've moved to Toolforge (#36) as it's not currently possible to refresh the prompts due to one of Heroku's IP ranges being blocked (see https://github.com/everypolitician/compare_with_wikidata/issues/29#issuecomment-321856875 for full error message).

lucychambers commented 7 years ago

Need to test this layout when things appear in the SPARQL but not in CSV. Layout implies column grouping (e.g. first two columns are grouped - which is misleading). Need to tidy up key.