databrickslabs / dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
https://databrickslabs.github.io/dbldatagen
Other
291 stars 57 forks source link

Changes and bug fixes to support shared clusters in DBR 14.2 #248

Closed ronanstokes-db closed 4 months ago

ronanstokes-db commented 7 months ago

Proposed changes

When using a shared cluster with UC shared mode in DBR 14.2, referencing the sparkContext produces an attribute error. To fix this, if an error is thrown when accessing the sparkContext, we use a default parallelism of 200 unless an alternative value was explicitly specified.

This PR also uses the Spark SQL function element_at rather than direct array indexing due to incompatibilities with some spark versions.

Types of changes

What types of changes does your code introduce to dbldatagen? Put an x in the boxes that apply

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (a6987b2) 92.19% compared to head (22925fd) 92.22%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #248 +/- ## ========================================== + Coverage 92.19% 92.22% +0.02% ========================================== Files 23 23 Lines 2754 2764 +10 Branches 471 472 +1 ========================================== + Hits 2539 2549 +10 Misses 128 128 Partials 87 87 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

GeekSheikh commented 4 months ago

It's working for me. We're testing it a little deeper but looks good. Please don't forget to bump the version.