build the Legislature Browser

mhl commented 6 years ago

What does it do?

This site would let you see, for each country in the world (possibly divided into those on our hit-list for 2018 and others), based entirely on Wikidata their national and sub-national legislatures and some of the properties of those legislatures, such as seat count and constituency count. These legislatures would include:

National legislatures for the country (should be easy, and already largely complete within Wikidata)
Legislatures for any first-level administrative country subdivision (FLACS)
Legislatures associated with any city with a population of over one million

Similarly, for each of the above areas, it should show the head of government, and the office they hold (via the properties we used in our mission to do this at a national level)

The idea is that if the Wikidata modelling is good, you should be able to list all of these just from Wikidata queries, starting from the country's item ID.

Future Work

If someone is logged in with Wikidata the tool should allow them to set missing data and correct erroneous or out of date information. This will be an important addition, that we will likely need very soon, but should be considered out of scope for the very first version.

Who are the users?

Initially this will be mySociety people on the Democratic Commons project, comparing what we know about the country (e.g. from Wikipedia, general knowledge, other research) with what's in Wikidata, so we can then update Wikidata. Shortly afterwards, it'll be our partners in other countries using it for the same process.

Other notes

The page for a country might easily list 100 legislatures - some countries (e.g. the US) have lots of subnational legislatures.

The idea here is that we building a minimally useful tool for checking and updating this data: it should be useful for us and we have a real user need for this, but it's important in software development terms too: this will be the first tool we've written that allows federated login to Wikidata and making edits as that logged-in user, and there's some risk there - we'll need to write several tools that do that for this project.

Existing work

@tmtmtmtm did a spike when we were in New York that was like everypolitician.org, but which was based entirely on Wikidata. At the top levels of the site, this is essentially a legislature browser, so hopefully some of that work could be reused, or we can learn from that.

crowbot commented 6 years ago

@tmtmtmtm did a spike when we were in New York that was like everypolitician.org, but which was based entirely on Wikidata.

@tmtmtmtm could you paste a link to that work?

mhl commented 6 years ago

We would now suggest that this is initially implemented as (unlinked-to) sub-pages of a country on everypolitician.org - like http://everypolitician.org/belgium/wikidata for example. This site is deployed as static files, generated from the viewer-sinatra sinatra app by crawling it with recursive wget from a start page which links to enough pages to crawl everything that should be deployed. Since these /wikidata pages won't initially be linked to from the rest of the site, links to them will need to be included on the start page.

There are various options for how these new pages could be implemented; some considerations for this are:

It takes a while to recrawl and redeploy the changes when a new deploy is triggered. (We have to consider this because a common use for this page is going to be to see if Wikidata modelling of legislatures in the country is correct, then fixing any problems there might be - people would naturally come straight back to the page to see if the fix has worked. The rebuild time is likely to make that feedback loop annoyingly slow even if we figure out a way of triggering a rebuild.)
Several tools we're planning are going to rely on client-side Javascript for making live queries (the model that Reasonator uses, for example) or updates (if they're logged in) on Wikidata. (An advantage of doing this client-side rather than server-side is that any rate-limiting, etc. will happen per user, not for all users.)
@tmtmtmtm has done a proof-of-concept of rendering these pages which is based on Sinatra, which should be helpful even if implementing in another language / framework.

Here are some options for how to implement this:

Query Wikidata in Ruby in the Sinatra app, and find a way to make seeing the effect of your updates easy, e.g.:
- a) Including links to equivalent SPARQL queries on the Wikidata Query Service in-page
- b) Add a way of triggering a rebuild just of that page
- c) Have client-side Javascript updating the page as a progressive enhancement, so you get up-to-the-minute page rendering with Javascript functioning and good network conditions, etc., but maybe slightly out-of-date data otherwise.
Only query Wikidata from client-side Javascript and build the page on the client from that.
Only query Wikidata from client-side Javascript and build / rewrite the page on the client from that, but also switch to crawling these pages with Capybara / PhantomJS (so this is like option 1c, but you only have to write the code for querying Wikidata and rendering the page once)

This is definitely up for debate (and there are probably other sensible options I haven't mentioned here) so please comment. However, my personal take is that I would do option 2, and then implement the extra bits required for option 3 at a later date (and certainly before making these pages public).

tmtmtmtm commented 6 years ago

I've uploaded the initial spike as https://github.com/tmtmtmtm/wikidata-legislature-browser

I don't think we'd want to use any of that "as is", though it could be treated as something to iterate rapidly from if we wanted to take route 1 above (I'd definitely want to take the logic out of the Page classes though, and create proper classes for different types of Wikidata Items.)

tmtmtmtm commented 6 years ago

In terms of tiny first steps, I'd suggest extending the current /:country/:house/wikidata page (which is not currently scraped, and thus only visible when running the app locally, not on everypolitician.org) to also display the seat count for that legislature via a Wikidata lookup.

That's fairly trivial in terms of the query required (it's a single P1342 property lookup on the item for the legislature), and we know the data exists already in the vast majority of cases (report at https://www.wikidata.org/wiki/Wikidata:EveryPolitician/Report:National_Legislatures/seat_count), and this doesn't require creating any extra pages, etc, so it lets us focus on getting the basics of the approach figured out, which can then be rapidly applied to generating other, more useful, pages.

tmtmtmtm commented 6 years ago

I would do option 2, and then implement the extra bits required for option 3 at a later date

I really like the option 3 approach, and if we do go that route it's probably worth switching the scraping process to Capybara / PhantomJS as an entirely distinct task relatively quickly. We do already have one page using that sort of progressive enhancement — http://everypolitician.org/needed — where we query the github API in JS and redraw the page in case the "in progress" countries have changed since the last time the site was generated. We do currently have to duplicate this logic in both JS and Ruby, so switching like this would already let us remove https://github.com/everypolitician/viewer-sinatra/blob/master/lib/page/needed.rb and prepare the way for doing something similar here.

everypolitician / democratic-commons-tasks