CampaignLab / data-pipeline

Scripts and schemas that aim to make data from the inventory easier to analyse
8 stars 8 forks source link

Consistently Labelling Wards #23

Open hannah-o-rourke opened 5 years ago

hannah-o-rourke commented 5 years ago

Task

We need to figure out how to match up ward data from different sources with a consistent identifier for each ward.

In 2018 election result data set wards are labelled by election_id Example: local.huntingdonshire.2018-05-03,local.huntingdonshire.yaxley.2018-05-03

This refers to Yaxley and Farcet ward in the local authority area of Huntingdonshire.

This is also referred to in government codes as E05002790

tomwwagstaff commented 5 years ago

It sounds like the code history database will be useful for this: https://ons.maps.arcgis.com/home/item.html?id=a6a1247a7d8e45068011e8f482cdf3c5

morkeltry commented 5 years ago

I'm pretty sure the E05002790 type codes are the most common, both for datasets and availabilty of converters/ lookup tables

LydiaMonnington commented 5 years ago

What output do we want here? I'm thinking a table of our canonical wards list, other info about those wards and other columns that could be matched on. Is CSV the best format?

LydiaMonnington commented 5 years ago

From Tom: Ward Name to Local Authority http://geoportal.statistics.gov.uk/datasets/046394602a6b415e9fe4039083ef300e_0 http://geoportal.statistics.gov.uk/datasets/417e93f21c5c419283ac23abc8eedcce_0

LydiaMonnington commented 5 years ago

This appears to be the 2018 ONS district to Ward and LA mapping. http://geoportal.statistics.gov.uk/datasets/interim-output-area-to-ward-to-local-authority-district-may-2018-lookup-in-england-and-wales

LydiaMonnington commented 5 years ago

Cleaning function to use on wards and councils

def normalise_data(text): text = text.lower()

Replace with space

text = text.replace("-"," ")

# Replace with no space
text = text.replace("and","")
text = text.replace("city of","")
text = text.replace("&","")
text = text.replace(".","")
text = text.replace(",","")
text = text.replace("city of","")
text = text.replace("'","")

# Fix double spaces
text = text.replace("  "," ")
text = text.strip()
return text
daniel-fahey commented 5 years ago

Note that the May 2018 local election results used the election_id as per Democracy Club here (https://elections.democracyclub.org.uk/reference_definition/). They also made a Python module for making these IDs (https://github.com/DemocracyClub/uk-election-ids). I kept these in place thinking it would better to not 'reinvent the wheel', but maybe separate columns is more helpful.