Open ajtucker opened 4 years ago
Stage 2 transform completed (I think). Will definitely expect to go back to do some re-working as I may have introduced a lot of confusion with more dataframes than are necessary. I also have a few questions based on how I have interpretted the spec:
"Remove 'kWh'" - from the measure type value or remove the whole unit column?
Is "Geography Level" the indicated string (e.g. "Local Authority") or the column value of local authority?
What happens to MSOA Code in tidy_d3? I removed it.
Just want to check that there are no "Unallocated" values in tidy_d4
Do I need to reformat year values to "year/{2017}"?
I have not included the marker column because there are no datamarkers in any distribution - have I missed something or is it always needed?
I have outputted 4 csv files (joined sales, mean and median + post code sales). Do you want all the individual datasets broken down by measure type?
Thank you!
Answers to above questions:
Remove the columns 'Measure Type' and 'Unit'. we can currently only have one measure and Unit type so rather than have a whole column with the same value we define the values in the info.json file.
Geography Level should be local-authority, lsoa, msoa etc. This column will probably not be needed in the future but i have put it in there just to be able to keep track of things.
We only need to upload the LSOA data as all the geography codes and their relationship to each other have been uploaded to PMD4 by Swirrl so LSOA codes will be able to refer back to their MSOA code at some point in the future.
yes, year should be formatted to "year/{2017} etc.
My mistake, looks like the 'Marker' is not needed
You have done the right thing by changing the measure type in the info.json data for each dataset. I have started doing this myself and is a nice work around until we can process multiple measures
Data has been published as 4 datasets
The first 3 could be joined up once we can have multiple measures but maybe Mean and Median consumption should be Attributes anyway Post Codes are a different format to the LA, LSOA and MSOA so need to be kept separate. I have not published the Mean and Median Consumption for Post Codes but they could easily be added as Attributes when needed
BA comments:
The postcode field is a URI The number of meters has not been included in any of the datasets due to the inability to have more than one measure type in a dataset on PMDv4 The description under each title needs to be updated to reflect the content? Incorrect contents issued date of 28 March 2019 has been used and the correct date is 08 April 2019. The usual caveats also apply.
3 Spreadsheets for MSOA, LSOA and Postcode levels. Lots of data but should be straight forward to transform