Open CaiusDai opened 9 months ago
Please let me know if any configurable factor is not reasonable or more factors are needed. I will work on the logical data generator first.
Data distrbution for each column
: number of distinct values & value occurrence distribution?
What is this issue about
A data generator that can generate configurable parquet files is needed for later benchmarks.
Goals
Ideally, the generator will provide two functionalities:
For more flexiblility, the generator is designed to also accept json file configuration (for complex or repeatable data generation).
Generator Configurations
Configurable factors for Logical Data Generator
Configurable factors for Physical Data Generator