Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
basic/geometries Creates a set of geometries represented as WKT
basic/process_historian Simulates process historian data with device ID, timestamp, tag, and value
basic/telematics Simulates GPS tracking data with device ID, timestamp, latitude, longitude, and heading
benchmark/groupby A benchmarking dataset with IDs, groups, and values of various types
Types of changes
What types of changes does your code introduce to dbldatagen?
Put an x in the boxes that apply
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
[x] Change to tutorials, tests or examples
[ ] Non code change (readme, images or other non-code assets)
[ ] Documentation Update (if none of the other choices apply)
Checklist
Put an x in the boxes that apply. You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask. We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
[x] Lint and unit tests pass locally with my changes
[x] I have added tests that prove my fix is effective or that my feature works
[x] I have added necessary documentation (if appropriate)
[ ] Any dependent changes have been merged and published in downstream modules
[ ] Submission does not reduce code coverage numbers
[x] Submission does not increase alerts or messages from prospector / lint
Proposed changes
Added the following standard datasets:
basic/geometries
Creates a set of geometries represented as WKTbasic/process_historian
Simulates process historian data with device ID, timestamp, tag, and valuebasic/telematics
Simulates GPS tracking data with device ID, timestamp, latitude, longitude, and headingbenchmark/groupby
A benchmarking dataset with IDs, groups, and values of various typesTypes of changes
What types of changes does your code introduce to dbldatagen? Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.Further comments
N/A