datasciencecampus / synthetic-data

Repo on generating synthetic data using GAN
6 stars 3 forks source link

BEIS National Energy Efficiency Data-Framework (NEED) - Synthesis #22

Open Yiannis20 opened 5 years ago

Yiannis20 commented 5 years ago

Following the data exploration phase, the Synthetic data generation platform developed at the ONS Data Science Campus will be applied to the NEED dataset.

Yiannis20 commented 5 years ago

The Synthetic data generation platform developed at the ONS Data Science Campus was applied to the NEED dataset.

Several preprocessing steps were carried out, including handling of categorical variables, removal of redundant variables and scaling.

Autoencoders were then used to generate synthetic data. The quality assessment of the generated data was carried out with the correlation structure comparison method. Satisfactory results in terms of statistical reconstruction accuracy were obtained after several rounds of optimisation. The results are shown in Figures 1 and 2.

beis_real Figure 1: Correlation structure for the real data of the NEED dataset.

beis_synth Figure 2: Correlation structure for the synthetic data generated with the Data Science Campus platform for the NEED dataset.