FINRAOS / DataGenerator

DataGenerator is a Java library for systematically producing large volumes of data. DataGenerator frames data production as a modeling problem, with a user providing a model of dependencies among variables and the library traversing the model to produce relevant data sets.
http://finraos.github.io/DataGenerator
Apache License 2.0
161 stars 170 forks source link

Setting max number of lines doesn't appear to work for DataDistributor #298

Closed anjueappen closed 8 years ago

anjueappen commented 8 years ago

For the the number of lines passed in via cmd line is 80. Regardless of this number, the output is always 5 lines.

dist = dist.setMaxNumberOfLines(80);

The default example (output to console) doesn't appear to set maxNumberOfLines for the distributor, yet it produces the same number of lines on each run. Is this variable set elsewhere in the code?

mibrahim commented 8 years ago

We'll try to reproduce it. Just as a general rule, the number of lines are the max so as long as you're generating fewer number of lines it won't stop. Also, it won't attempt to duplicate or repeat the number of lines to reach the max.

anjueappen commented 8 years ago

Thanks for your help! We tried in increase the number of permutations across the columns and that fixed it. It was as you mentioned, the engine didn't duplicate any lines to meet the maximum and we had too few permutations in our code to do this.

We are finding, however, that as we introduce increase the line count into the 10s of millions, all the threads appear to block indefinitely, or until the JVM memory ran out.

The log file we had can be found here [https://raw.githubusercontent.com/anjueappen/391DataGeneration/master/hs_err_pid26610.log]

mibrahim commented 8 years ago

@anjueappen yes you are correct. If you're trying to generate data more than the heap allocated for you jvm, it will eventually run out of memory and error out.

As we understand now, you've reached a solution for this issue. I'll close this bug and please let us know if you have other questions or issues. Thanks