Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
When you want to generate a random value for a field, you use the option random=True.
Current Behavior
This currently only works if an upper bound (i.e max value) is specified for the column.
Upper bounds are implicitly calculated when using the values option, the uniqueValues option also.
The workaround in the current release is to always specify an upper bound using either the maxValue option, the uniqueValues option or other options such as values that implicitly compute an upper bound for the range of values produced.
Steps to Reproduce (for bugs)
The following code works correctly generating random data on all columns marked as random except for customer_id2
Expected Behavior
When you want to generate a random value for a field, you use the option
random=True
.Current Behavior
This currently only works if an upper bound (i.e max value) is specified for the column. Upper bounds are implicitly calculated when using the
values
option, theuniqueValues
option also.The workaround in the current release is to always specify an upper bound using either the
maxValue
option, theuniqueValues
option or other options such asvalues
that implicitly compute an upper bound for the range of values produced.Steps to Reproduce (for bugs)
The following code works correctly generating random data on all columns marked as random except for
customer_id2
Context
Your Environment
dbldatagen
version used: