ConnectedPlacesCatapult / TomboloDigitalConnector

The Tombolo Digital Connector enables users to combine different sources of data in a transparent and reproducible way.
MIT License
58 stars 29 forks source link

Initial feedbacks on spatial granularity-From LSOA level down to postcode and address #549

Open steven4320555 opened 6 years ago

steven4320555 commented 6 years ago

First of all, I just came across the Digital Connector this Friday (16th March 2018), the feedbacks are initial, but maybe relevant for the future development of TomboloDigitalConnector. (I am submitting the issue as a part of the CityDataHack, I am in team KeepGoing.)

Generally speaking, the digital connector is a powerful open source platform for integrating data from different sources, with different format, and spatial granularity. The idea of integrating data with well defined meta data structure and data integration system is very useful, and can help people getting started with analysing data. But the spatial granularity is at LSOA level at the lowest granularity, which can be improved with some openly available lookups, such as National Statistics UPRN Lookup (NSUL) for Great Britain http://geoportal.statistics.gov.uk/datasets?q=NSUL , and National Statistics Postcode Lookup http://geoportal.statistics.gov.uk/datasets?q=NSPL. With the lookup, some data at finer spatial granularity can be intergrated, for example:

  1. EPC data https://epc.opendatacommunities.org/ - The Standard Assessment Procedure (based on BRE Domestic Energy Models) have been used by the UK government to assess and compare the energy performance of dwellings, and to generate EPC before properties can be put on market for sale or rent. Property level EPC data used to be publicly available only as PDF format, but EPC ratings as well as some of the underlying data (including address and postcode, built form, floor area, estimated heating cost etc.) were released on 27/Mar/2017, for over 15million individual properties in England and Wales. Key assumptions for deriving an EPC rating are: 1) No. of occupants is a function of floor area, and homes are assumed to be heated according to typical heat demand and heating pattern; 2) a UK average temperature is used to derive the EPC rating - so the rating is consistent over time and place, 3) Postcode district level weather and the current fuel price (contained in SAP PRODUCT CHARACTERISTICS DATA FILE) is used to calculate energy cost and potential savings after retrofit; 4) Cooking and electrical appliance usage is not included in the energy consumption, except for green deal occupancy assessment; 5) building age bands define the typical building thermal performance.
  2. Postcode level gas and electricity https://www.gov.uk/government/collections/sub-national-gas-consumption-data#postcode-level-data https://www.gov.uk/government/collections/sub-national-electricity-consumption-data#postcode-level-data - the meter point aggregated subnational energy statistics is released at an increasingly fine level of details. The finest granularity is postcode level gas and electricity consumptions for postcodes with more than 6 gas/electricity meters, and there are data for 2013 and 2015.
  3. 2011 Census data (https://www.ons.gov.uk/census/2011census/2011censusdata)– provides a snapshot of family size, age, residency, work, health, housing etc. for output areas with around 125 households. At postcode level, enumeration postcode population estimates (https://www.nomisweb.co.uk/census/2011/postcode_headcounts_and_household_estimates) is also used.
  4. Addressbase Premium (https://www.ordnancesurvey.co.uk/business-and-government/products/addressbase-premium.html) via Geovation hub (https://geovation.uk/hub/#what )- Provides more than 37 million individual addresses with building classification. The National statistical postcode lookup links postcode level information to OA and higher level statistical areas. The hierarchical data framework is established based on the relationships of different data set with the location-based digital infrastructure. Other data sets in brief (increasing level of aggregation): Over 22million price paid data for properties sold in England and Wales. OS mastermap Topographical layer and building heights estimates. Land ownership from INSPIRE polygon. LiDAR data covering 50cm or 1m squares of land. REFIT Smart Home dataset for 20 homes in Loughborough. 11 million Zoopla property listings data via UBDC. Code-point with polygon from 2011 to 2017. Fuel poverty statistics and 'hard-to-treat' data at OA and LSOA level. Domestic Energy consumption at LSOA level, Property age band and council tax band at LSOA level, Non-domesticEnergy consumption at MSOA level.

A brief overview of the dataset in a picture can be found in one of my twitter comment, https://twitter.com/steven4320555/status/973118322920361984

As I am not good at coding, the implementation of the new data sources need to be carried out by more competent people, but I am happy to share the knowledge generated when integrating the datasets.

ZHANG Yu (Steven) 3rd year PhD student @ Loughborough University