datasciencecampus / synthetic-data

Repo on generating synthetic data using GAN
7 stars 3 forks source link
dsc-projects

Synthetic data

Repo on generating synthetic data

The project involves the generation of synthetic data using machine learning in order to replace real data for the purpose of data processing. This is particularly useful in cases where the real data is sensitive (e.g. microdata, medical records, defence data). Additionally, the methods developed as part of the project can be used for imputation. Regarding data sources, publicly available data (open data) will be used initially. Once the developed methods have matured, they will be applied to ONS data such as Trade-ITIS, LFS and Census data.

The main machine learning methods investigated by our team for the generation of synthetic data are generative adversarial networks (GANs), variational autoencoders (VAEs) and auto-regressive models.

The Synthetic data project will result in a safer way to share data in cases where the real data is sensitive. Additionally, it will make sharing data between the research communities and ONS easier and faster. Furthermore, the project is linked to several current ONS Data Science projects (Trade, Housing, etc.).