everypolitician / everypolitician-data

data for national legislatures worldwide
http://everypolitician.org/
237 stars 54 forks source link

Uganda (Parliament): Unrelated changes in source-specific builds #31158

Open ondenman opened 7 years ago

ondenman commented 7 years ago

Problem with Incoming Data

Legislature

Uganda (Parliament)

Problem

Refreshing a specific source results in unrelated changes.

Other info

chrismytton commented 7 years ago

There are a few different things going on here.

Build output is different on Mac vs Linux

The currently committed files were generated locally, on a Mac, but the rebuild is happening on Heroku, which is Ubuntu 14.04.5 LTS.

Diff between Mac and Linux builds.

So it seems that FuzzyMatch is returning different result on Mac vs Linux.

From Slack conversation about this with @tmtmtmtm:

The higher level problem is that we shouldn't be using Fuzzy Match at build time. If we need to match, we should use reconciliation files like we do everywhere else. We can use FuzzyMatch to help generate that file, but it shouldn't be doing calculations like that when building.

Duplicate area names in OCD source

The problem this issue has revealed is that the Uganda OCD IDs source has duplicate area names that only differ in case, e.g. "Nakifuma County":

id name
ocd-division/country:ug/region:central/subregion:buganda/district:mukono/constituency:nakifuma_county Nakifuma county
ocd-division/country:ug/region:central/subregion:buganda/district:mukono/constituency:nakifuma_county Nakifuma County

So we'd need to tidy that up as well as fixing the Mac vs Linux issue. (Though we might be able to ignore this for now if we pin down the area matches in a reconciliation file.)

Next steps

tmtmtmtm commented 7 years ago

As a quick improvement here, I tidied up some of the data in the spreadsheet, regenerated the OCD file, and rebuild the EP data

We still need to either fix the underlying issue here, or, indeed simply replace the whole concept, but this should help a little bit.