Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
basic/geometries - WKT geometries of type POINT, LINESTRING, or POLYGON
basic/process_historian - Time-series sensor data simulating values from a process historian system
basic/telematics - Time-series location data simulating vehicle telematics
benchmark/groupby - Benchmarking data with a set of grouping keys
Types of changes
What types of changes does your code introduce to dbldatagen?
Put an x in the boxes that apply
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[x] Change to tutorials, tests or examples
[ ] Non code change (readme, images or other non-code assets)
[ ] Documentation Update (if none of the other choices apply)
Checklist
Put an x in the boxes that apply. You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask. We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
[x] Lint and unit tests pass locally with my changes
[x] I have added tests that prove my fix is effective or that my feature works
[x] I have added necessary documentation (if appropriate)
[ ] Any dependent changes have been merged and published in downstream modules
[ ] Submission does not reduce code coverage numbers
[ ] Submission does not increase alerts or messages from prospector / lint
Further comments
I have added several standard datasets along with unit tests where appropriate.
Proposed changes
Added a few standard datasets:
basic/geometries
- WKT geometries of typePOINT
,LINESTRING
, orPOLYGON
basic/process_historian
- Time-series sensor data simulating values from a process historian systembasic/telematics
- Time-series location data simulating vehicle telematicsbenchmark/groupby
- Benchmarking data with a set of grouping keysTypes of changes
What types of changes does your code introduce to dbldatagen? Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
I have added several standard datasets along with unit tests where appropriate.