everypolitician / compare_with_wikidata

Library for diffing Wikidata and CSVs
MIT License
2 stars 0 forks source link

Extraneous changes due to pathname switches #95

Closed tmtmtmtm closed 7 years ago

tmtmtmtm commented 7 years ago

When I manually refreshed a prompts yesterday afternoon I got a lot of invisible diffs changing the path to various templates from "prompts/Seat Count/…" to "prompts/Seat_Count/…" (with an underscore instead of a space).

Today's automated update switched them all back again. When I re-ran it manually just now, they flipped again.

Another re-run leaves it as is, so it looks like it's doing something different when triggered manually vs the scheduled run (rather than just flipping on each successive run).

If nothing else has actually changed, this creates a needless page revision (potentially triggering any users who has it on their Watchlist to look at a page that hasn't actually changed); if something material actually has changed, and might need investigation, this makes it harder to spot amongst lots of other diffs.

mhl commented 7 years ago

The nightly updates are run from https://github.com/everypolitician/prompter/blob/master/script/update-all-prompts which gets titles from the results of the MediaWiki API's embeddedin action:

result = client.action(:query, list: 'embeddedin', eititle: WIKI_TEMPLATE_NAME, eilimit: 500)

The titles you get back from these have the namespace and page name, with spaces:

irb(main):011:0> result.data['embeddedin'].map { |ei| ei['title'] }
=> ["User:Chris Mytton/sandbox/daff us senate", "User:Chris Mytton/sandbox/prompts/heads of government", "User:Oravrattas/prompts/Seat Count", "User:Oravrattas/prompts/Riigikogu", "User:Oravrattas/prompts/Riigikogu 13 EveryPolitician", "User:Lucyfediachambers/sandbox/Daff UK Twitter", "User:Oravrattas/prompts/Riigikogu-twitter", "User:Chris Mytton/sandbox/prompts/Riigikogu", "User:Zarino/wikidata csv 1", "User:Oravrattas/prompts/Riigikogu 12 EveryPolitician", "User:Mhl20/prompts/Minsters University", "User:Oravrattas/prompts/Riigikogu 11 EveryPolitician", "User:Mhl20/prompts/test errors", "User:Chris Mytton/sandbox/prompts/Pakistan National Assembly", "User:Lucyfediachambers/sandbox/prompts/UK", "User:Oravrattas/prompts/Northern Ireland Assembly", "User:Oravrattas/prompts/Scottish Parliament", "User:Chris Mytton/sandbox/prompts/Pakistan National Assembly official site", "User:Oravrattas/prompts/Nothern Irish Assembly", "User:Oravrattas/prompts/Finland/Eduskunta", "User:Lucyfediachambers/sandbox/prompts/Germany", "User:Alessandro Piscopo/sandbox/prompts/Italy", "User:Mhl20/prompts/test query results", "User:Mhl20/prompts/test seat count"]

When triggered manually, however, these are triggered via the URL generated from this template: https://www.wikidata.org/w/index.php?title=Template:Compare_Wikidata_with_CSV/refresh_url&action=edit

... which uses the magic word FULLPAGENAMEE. That magic word uses underscores instead of spaces - to get the space for FULLPAGENAME can be used instead. I'm going to change that template so that the space form is used for manual refreshes as well. (Going by the description here: https://www.mediawiki.org/wiki/Manual:Title.php it seems as if spaces and underscores are generally interchangeable, but the underscore form is preferred in URLs - we're generally not using these in a URL context, so the space form seems more natural anyway.)

mhl commented 7 years ago

Ah, no - the Clickable button template doesn't cope with a url parameter with spaces in it, it seems. With the template change I suggested above, refreshing the page you get:

broken-button

... and the link from the URL is missing the University: https://tools.wmflabs.org/prompter/?mediawiki_site=www.wikidata.org&page_title=User:Mhl20/prompts/Minsters

A better (more robust) alternative overall, I think, is to change compare_with_wikidata to consistently normalize the page title to the spaces form before using it...

mhl commented 7 years ago

I think this should be fixed with: https://github.com/everypolitician/compare_with_wikidata/pull/96 which is deployed now - please reopen this if that's not working as expected now.