The project involves the generation of synthetic data using machine learning in order to replace real data for the purpose of data processing and analysis. This is particularly useful in cases where the real data is sensitive (e.g. microdata, medical records, defence data). Additionally, the methods developed as part of the project can be used for imputation. An explicit objective of the project is that the output synthetic datasets would not be disclosive / contain identifiable data. Regarding data sources, publicly available data (open data) will be used initially. Once the developed methods have matured, they will be applied to ONS data such as Trade-ITIS.
What is the data science
We investigate several state-of-the-art algorithms which are used to generate synthetic data such as generative adversarial networks (GANs), variational autoencoders (VAE) and autoregressive models. Additionally, since the project involves big data, we work on the efficient implementation of the synthetic data generation algorithms using graphics processing units (GPUs).
What is the impact
The project will result in a safer way to share data in cases where the real data is sensitive (e.g. microdata, medical records, defence data). Additionally, It will make sharing data between the research communities and ONS easier and faster. Furthermore, the project is linked to several current ONS Data Science projects (Trade, Housing, etc.).
Partners
In this project we work closely with ONS Methodology, ONS Trade team and the United Nations global platform.
Overview
The project involves the generation of synthetic data using machine learning in order to replace real data for the purpose of data processing and analysis. This is particularly useful in cases where the real data is sensitive (e.g. microdata, medical records, defence data). Additionally, the methods developed as part of the project can be used for imputation. An explicit objective of the project is that the output synthetic datasets would not be disclosive / contain identifiable data. Regarding data sources, publicly available data (open data) will be used initially. Once the developed methods have matured, they will be applied to ONS data such as Trade-ITIS.
What is the data science
We investigate several state-of-the-art algorithms which are used to generate synthetic data such as generative adversarial networks (GANs), variational autoencoders (VAE) and autoregressive models. Additionally, since the project involves big data, we work on the efficient implementation of the synthetic data generation algorithms using graphics processing units (GPUs).
What is the impact
The project will result in a safer way to share data in cases where the real data is sensitive (e.g. microdata, medical records, defence data). Additionally, It will make sharing data between the research communities and ONS easier and faster. Furthermore, the project is linked to several current ONS Data Science projects (Trade, Housing, etc.).
Partners
In this project we work closely with ONS Methodology, ONS Trade team and the United Nations global platform.