databrickslabs / dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
https://databrickslabs.github.io/dbldatagen
Other
291 stars 57 forks source link

DBLDatagen broken on 14.3LTS Shared Cluster #250

Closed GeekSheikh closed 4 months ago

GeekSheikh commented 5 months ago

DBR 14.3 LTS shared clusters use Spark Connect. There are apparently some unsupported references in the code base that break the generator on shared clusters.

To replicate, just try to generate data on 14.3LTS shared cluster.

image

ronanstokes-db commented 5 months ago

PR to fix lack of sparkContext is https://github.com/databrickslabs/dbldatagen/pull/248

ronanstokes-db commented 4 months ago

Fixed in v0.3.6