datasciencecampus / synthetic-data

Repo on generating synthetic data using GAN
6 stars 3 forks source link

Draft Technical report for DSC website #18

Closed SharonHill closed 6 years ago

Yiannis20 commented 6 years ago

Overview

The project involves the generation of synthetic data using machine learning in order to replace real data for the purpose of data processing and analysis. This is particularly useful in cases where the real data is sensitive (e.g. microdata, medical records, defence data). Additionally, the methods developed as part of the project can be used for imputation. An explicit objective of the project is that the output synthetic datasets would not be disclosive / contain identifiable data. Regarding data sources, publicly available data (open data) will be used initially. Once the developed methods have matured, they will be applied to ONS data such as Trade-ITIS.

What is the data science

We investigate several state-of-the-art algorithms which are used to generate synthetic data such as generative adversarial networks (GANs), variational autoencoders (VAE) and autoregressive models. Additionally, since the project involves big data, we work on the efficient implementation of the synthetic data generation algorithms using graphics processing units (GPUs).

What is the impact

The project will result in a safer way to share data in cases where the real data is sensitive (e.g. microdata, medical records, defence data). Additionally, It will make sharing data between the research communities and ONS easier and faster. Furthermore, the project is linked to several current ONS Data Science projects (Trade, Housing, etc.).

Partners

In this project we work closely with ONS Methodology, ONS Trade team and the United Nations global platform.