geeksforsocialchange / PlaceCal

Bring your community together
https://placecal.org
GNU Affero General Public License v3.0
16 stars 6 forks source link

Implement a system for updating neighbourhood data #1741

Closed ivan-kocienski-gfsc closed 1 year ago

ivan-kocienski-gfsc commented 1 year ago

Description

Our neighbourhood data may be getting out of date as we are getting failed postcode lookups on Partner Address's

Steps to reproduce

  1. Try making a partner with the postcode "EC1V 2NX"

Implementation idea

  1. implement a way to import neighbourhood data. We used to have this somewhere but it needs to be part of our database seeding script
  2. download new geo data
  3. try to resolve the new data with the old, collect a list of records that are a> in our db that don't exist and b> in their dataset but not ours
  4. try using lat/long data to find the nearest neighbourhood
katjam commented 1 year ago

https://github.com/geeksforsocialchange/PlaceCal/issues/790 https://github.com/geeksforsocialchange/PlaceCal/issues/783

ivan-kocienski-gfsc commented 1 year ago

Okay so far I have written a scripts that:

Things up next

ivan-kocienski-gfsc commented 1 year ago

Progress update: in specific terms of "fixing broken postcodes that postcodes.io would return and we had no matching neighbourhood record" i have found a solution that should work.

Postcodes.io declares which dataset it uses on its 'about' page (it is currently Nov 2022). This can then be downloaded from the ONS website directly.

(I have written a script that extracts all the postcodes from the Address table).

I have written a script that will take raw ONS postcode CSV and produce a "payload". This was done because it is a fairly large amount of data (1.3 gb of text). The payload is only 800kb.

I have written a rake task that takes the payload and clears out the neighbourhood table and uploads the new dataset.

Then using the postcodes from earlier I lookup the postcode from postcodes.io and try to match the ward code response to a neighbourhood.

If the number of failing postcodes is reduced from pre-run to post-run then the data has been updated successfully.

TODO:

To consider

We could actually host our own postcode lookup system using the ONS dataset which would mean we could download and upgrade our postcode table more often than postcodes.io. This table will be large to setup and populating it for testing would be very slow.

kimadactyl commented 1 year ago

I do agree about self hosting postcode data but imo that deffo be another ticket/issue even if we do it right after :) this feels like 2 issues in 1 currently.

Great progress on this - good to finally have a plan for it.

I take it you noticed that there's currently a python script doing the importing - would be great to remove this and have it all native ruby.

ivan-kocienski-gfsc commented 1 year ago

I am not suggesting we do the postcode lookup side of things at this point. I was just commenting that we have all the information necessary to do that, should that become something we'd like to look at in future.

ivan-kocienski-gfsc commented 1 year ago

The final stage of this work would be to go through and clean-sweep the old code out.

(There is some stuff in the migrations that could go.)

kimadactyl commented 1 year ago

Propose we close off this ticket and open smaller ones

r-ferrier commented 1 year ago

There is a linked PR for something relating to this here: https://github.com/geeksforsocialchange/PlaceCal/pull/1820

r-ferrier commented 1 year ago

closing as addressed in other tickets