alan-turing-institute / spc-hpc-pipeline

Azure batch pipeline for the SPC project
MIT License
2 stars 0 forks source link

UKCensusAPI for Scotland seems to lose data #26

Closed AoifeHughes closed 1 year ago

AoifeHughes commented 1 year ago

https://github.com/alan-turing-institute/UKCensusAPI/blob/86c45eb096cf38cdf413c913dc67e1e8e9642c4a/ukcensusapi/NRScotland.py#L236 This line used in Step 1 of the HPC script returns an empty dataframe causing errors but still seemingly allowing further analysis to occur.

AoifeHughes commented 1 year ago

I can't tell if this is intention based on the region code swapping that happens. If so, why does the function not return earlier? Is there expected data here?

AoifeHughes commented 1 year ago

The "workaround" statement just above this is concerning and the reason for this. As it leads to the georgraphy's being missing and not aligning

AoifeHughes commented 1 year ago

~Oh wait... it might be a typing issue of geography code!~

AoifeHughes commented 1 year ago

None of the files in the cached scotland files correlate with what's being given by the geography codes here... It can't not return empty datasets. Something is being misaligned.

AoifeHughes commented 1 year ago

The parameters passed to nomi in nomisweb.py getdata function is ignoring places! https://github.com/alan-turing-institute/UKCensusAPI/blob/master/ukcensusapi/Nomisweb.py https://github.com/alan-turing-institute/UKCensusAPI/blob/master/ukcensusapi/Nomisweb.py#L204

AoifeHughes commented 1 year ago

So the reason this doesn't work is because self.data_api_sc.get_geog(self.region, "LSOA11") is called to get a list of geography codes. Calling self.data_api_sc._NRScotland__get_rawdata("QS103SC","LSOA11") to get a list of the data results in columns which cannot be matched up! Still scratching head on this.