Open ondenman opened 7 years ago
There are a few different things going on here.
The currently committed files were generated locally, on a Mac, but the rebuild is happening on Heroku, which is Ubuntu 14.04.5 LTS.
Diff between Mac and Linux builds.
So it seems that FuzzyMatch
is returning different result on Mac vs Linux.
From Slack conversation about this with @tmtmtmtm:
The higher level problem is that we shouldn't be using Fuzzy Match at build time. If we need to match, we should use reconciliation files like we do everywhere else. We can use FuzzyMatch to help generate that file, but it shouldn't be doing calculations like that when building.
The problem this issue has revealed is that the Uganda OCD IDs source has duplicate area names that only differ in case, e.g. "Nakifuma County":
id | name |
---|---|
ocd-division/country:ug/region:central/subregion:buganda/district:mukono/constituency:nakifuma_county | Nakifuma county |
ocd-division/country:ug/region:central/subregion:buganda/district:mukono/constituency:nakifuma_county | Nakifuma County |
So we'd need to tidy that up as well as fixing the Mac vs Linux issue. (Though we might be able to ignore this for now if we pin down the area matches in a reconciliation file.)
As a quick improvement here, I tidied up some of the data in the spreadsheet, regenerated the OCD file, and rebuild the EP data
We still need to either fix the underlying issue here, or, indeed simply replace the whole concept, but this should help a little bit.
Problem with Incoming Data
Legislature
Uganda (Parliament)
Problem
Refreshing a specific source results in unrelated changes.
Other info
sort_by
issues we had before.