Closed ivan-kocienski-gfsc closed 1 year ago
Okay so far I have written a scripts that:
Things up next
Progress update: in specific terms of "fixing broken postcodes that postcodes.io would return and we had no matching neighbourhood record" i have found a solution that should work.
Postcodes.io declares which dataset it uses on its 'about' page (it is currently Nov 2022). This can then be downloaded from the ONS website directly.
(I have written a script that extracts all the postcodes from the Address table).
I have written a script that will take raw ONS postcode CSV and produce a "payload". This was done because it is a fairly large amount of data (1.3 gb of text). The payload is only 800kb.
I have written a rake task that takes the payload and clears out the neighbourhood table and uploads the new dataset.
Then using the postcodes from earlier I lookup the postcode from postcodes.io and try to match the ward code response to a neighbourhood.
If the number of failing postcodes is reduced from pre-run to post-run then the data has been updated successfully.
We could actually host our own postcode lookup system using the ONS dataset which would mean we could download and upgrade our postcode table more often than postcodes.io. This table will be large to setup and populating it for testing would be very slow.
I do agree about self hosting postcode data but imo that deffo be another ticket/issue even if we do it right after :) this feels like 2 issues in 1 currently.
Great progress on this - good to finally have a plan for it.
I take it you noticed that there's currently a python script doing the importing - would be great to remove this and have it all native ruby.
I am not suggesting we do the postcode lookup side of things at this point. I was just commenting that we have all the information necessary to do that, should that become something we'd like to look at in future.
The final stage of this work would be to go through and clean-sweep the old code out.
(There is some stuff in the migrations that could go.)
Propose we close off this ticket and open smaller ones
There is a linked PR for something relating to this here: https://github.com/geeksforsocialchange/PlaceCal/pull/1820
closing as addressed in other tickets
Description
Our neighbourhood data may be getting out of date as we are getting failed postcode lookups on Partner Address's
Steps to reproduce
Implementation idea