Open mahmoudibrahim98 opened 3 years ago
Also, can you point out for me the effect of using k=3 over the effect of using k=2.
Please try k=1
. k
is the number of parents for nodes in the constructed Bayesian network. The running time / complexity of DataSynthesizer increases dramatically with k
.
When k=0
, its value will be self-determined, which could be very large.
Description
Hello, I am using DataSynthesizer to generate synthetic data for research purposes. I've been using this package for moths and it works perfectly with small datasets. However, when I use a bigger dataset, especially higher number of columns, time problem rises. A single dataset(with 71236 instances and 52) took more than 18 hours to be synthesized on a 64 core machine(degree_of_bayesian_network =0 in this case) . I also tried to decrease the degree_of_bayesian_network , by assigning it to 2 instead of the default 0. Although the quality of the synthesized data decreases, Time decreases , but it's still taking too long. What do you suggest to do? Is there a better way you recommend to approach bigger datasets?
What I Did