Open hannah-o-rourke opened 5 years ago
It sounds like the code history database will be useful for this: https://ons.maps.arcgis.com/home/item.html?id=a6a1247a7d8e45068011e8f482cdf3c5
I'm pretty sure the E05002790 type codes are the most common, both for datasets and availabilty of converters/ lookup tables
What output do we want here? I'm thinking a table of our canonical wards list, other info about those wards and other columns that could be matched on. Is CSV the best format?
From Tom: Ward Name to Local Authority http://geoportal.statistics.gov.uk/datasets/046394602a6b415e9fe4039083ef300e_0 http://geoportal.statistics.gov.uk/datasets/417e93f21c5c419283ac23abc8eedcce_0
This appears to be the 2018 ONS district to Ward and LA mapping. http://geoportal.statistics.gov.uk/datasets/interim-output-area-to-ward-to-local-authority-district-may-2018-lookup-in-england-and-wales
Cleaning function to use on wards and councils
def normalise_data(text): text = text.lower()
text = text.replace("-"," ")
# Replace with no space
text = text.replace("and","")
text = text.replace("city of","")
text = text.replace("&","")
text = text.replace(".","")
text = text.replace(",","")
text = text.replace("city of","")
text = text.replace("'","")
# Fix double spaces
text = text.replace(" "," ")
text = text.strip()
return text
Note that the May 2018 local election results used the election_id
as per Democracy Club here (https://elections.democracyclub.org.uk/reference_definition/). They also made a Python module for making these IDs (https://github.com/DemocracyClub/uk-election-ids). I kept these in place thinking it would better to not 'reinvent the wheel', but maybe separate columns is more helpful.
Task
We need to figure out how to match up ward data from different sources with a consistent identifier for each ward.
In 2018 election result data set wards are labelled by election_id Example: local.huntingdonshire.2018-05-03,local.huntingdonshire.yaxley.2018-05-03
This refers to Yaxley and Farcet ward in the local authority area of Huntingdonshire.
This is also referred to in government codes as E05002790