databrickslabs / dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
https://databrickslabs.github.io/dbldatagen
Other
291 stars 57 forks source link

Using Databricks Labs Data Generator on Databricks Runtime 14.x #241

Closed ronanstokes-db closed 4 months ago

ronanstokes-db commented 9 months ago

Expected Behavior

No errors

Current Behavior

There is an issue when running the Data Generator on Unity Catalog environment in the 14.1 runtime. The DataGenerator will try to determine the appropriate number of partitions to use when they are not specified.

When running with this release on a Unity Catalog enabled shared cluster using runtime release 14.1, you will receive an error (exception) if you don't explicitly specify the number of partitions.

This does not affect the other access modes for Unity Catalog enabled clusters, or the 13.3 LTS runtime (which we recommend for now)

Workaround

The workaround is to explicitly specify the number of partitions. We are working on a solution to this and will update this over the next couple of days.

Context

Only applies to use of data generator on Databricks runtime release 14.x with Unity Catalog enabled cluster with shared mode enabled

Your Environment