Closed MattStammers closed 1 year ago
The raw data spec requires townsend_score_decile
which has 10 levels. A new column called townsend_score_quintiles
is generated as part of the feature engineering step which appears in the AdmittedCareFeatureSchema
. LTH use deciles as standard IMD uses deciles.
fair enough. I will close this as you are right and we can't change the original protocol
Townsend score (2011) is different from IMD (2019).
Data spec should be townsend_score_quintile
instead of townsend_score_decile
https://statistics.ukdataservice.ac.uk/dataset/2011-uk-townsend-deprivation-scores
Why are we using the 2011 Townsend score instead of the 2019 IMD? @quindavies, would you update this issue with response from Sheffield please?
I will hold off changing how the pipeline works for this particular variable for now.
From Sheffield:
We were originally going to use IMD scores but these only cover England. As Edinburgh will be providing data for these projects we needed a deprivation score that covered both England and Scotland, hence we chose the Townsend score. There is IMD 2020 for Scotland but the quintiles derived from this wouldn't be consistent with those for IMD for England.
As far as I am aware 2011 is the most up to date version of the Townsend score. This may cause some issues as there will have been new postcodes created since 2011 and therefore won't be in the database. To keep the data extract as simple as possible I think we will just have to treat the deprivation score/quintile for these postcodes as missing.
In the validate.py file the townsend score validation allows for up to 10 levels (deciles). This should be changed to quintiles