fstpackage / synthetic

R package for dataset generation and benchmarking
GNU Affero General Public License v3.0
20 stars 1 forks source link

Method table definition takes correlation into account #36

Open MarcusKlik opened 4 years ago

MarcusKlik commented 4 years ago

Advanced feature to generate dataset samples from a source dataset with the correlations between column vectors retained:

dt <- fread("some_data.csv")
generator <- table_definition(dt, id =  "some data sample", correlate = TRUE)

This could be easily done by keeping the original data in memory and just sample the rows. Alternatively, a model from the source dataset could be constructed and used to generate samples.

MarcusKlik commented 4 years ago

There should also be an option to take auto-correlation into account (thanks @bootje).