databrickslabs / dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
https://databrickslabs.github.io/dbldatagen
Other
357 stars 61 forks source link

Feature repeatable text generation #132

Closed ronanstokes-db closed 1 year ago

ronanstokes-db commented 2 years ago

Proposed changes

Enhance text generation from Template Text Generator

Switch to vectorized implementation using Numpy / Pandas and take advantage of Numpy random number generation repeatability

This PR implements repeatable vectorized implementation of TemplateTextGenerator

It also expands coverage of text generators substantially

Types of changes

What types of changes does your code introduce to dbldatagen? Put an x in the boxes that apply

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

codecov[bot] commented 2 years ago

Codecov Report

Merging #132 (2959c26) into master (aea3f84) will increase coverage by 2.58%. The diff coverage is 96.75%.

@@            Coverage Diff             @@
##           master     #132      +/-   ##
==========================================
+ Coverage   86.90%   89.49%   +2.58%     
==========================================
  Files          21       22       +1     
  Lines        2161     2332     +171     
  Branches      367      375       +8     
==========================================
+ Hits         1878     2087     +209     
+ Misses        183      160      -23     
+ Partials      100       85      -15     
Impacted Files Coverage Δ
dbldatagen/text_generators.py 96.83% <96.75%> (+17.06%) :arrow_up:
dbldatagen/__init__.py 90.90% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

lgtm-com[bot] commented 2 years ago

This pull request introduces 1 alert when merging e66760d73c83231991ffab1baa02ee8c18db58da into d6b1799ecf5bcb3be9ac14d3366454da427f52c1 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 2 years ago

This pull request introduces 6 alerts when merging f8eb82b9a8c504b3cafa162e6b58e2fb73cebbfc into d6b1799ecf5bcb3be9ac14d3366454da427f52c1 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 2 years ago

This pull request introduces 6 alerts when merging a3b076f9629ff93764d2c7fb7c7ccc29f57e73b9 into d6b1799ecf5bcb3be9ac14d3366454da427f52c1 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 2 years ago

This pull request introduces 1 alert when merging a38e27f3636fab7c4465e81c9efee705f8096343 into d6b1799ecf5bcb3be9ac14d3366454da427f52c1 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 2 years ago

This pull request introduces 1 alert when merging a316c097a4ab80801195347b02ad0e08227360c5 into d6b1799ecf5bcb3be9ac14d3366454da427f52c1 - view on LGTM.com

new alerts:

lgtm-com[bot] commented 2 years ago

This pull request introduces 1 alert when merging 254f75c17b67738d743f3467483fc2ed2f77a6b7 into 109707e9fdec3185688dce5a1d9a7f342cca069d - view on LGTM.com

new alerts:

lgtm-com[bot] commented 1 year ago

This pull request introduces 1 alert when merging 6acf8a4d38953077fdcfe546dd1371ed095e4673 into 981a5a4b07b0ee981c5b77ee908c855c47f84bb2 - view on LGTM.com

new alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine :gear: that powers LGTM.com. For more information, please check out our post on the GitHub blog.

lgtm-com[bot] commented 1 year ago

This pull request introduces 1 alert when merging e4dad981dce38227754fb0a12eaf5e4bbcc2ae35 into 981a5a4b07b0ee981c5b77ee908c855c47f84bb2 - view on LGTM.com

new alerts:

Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. Please enable GitHub code scanning, which uses the same CodeQL engine :gear: that powers LGTM.com. For more information, please check out our post on the GitHub blog.

howardwu-db commented 1 year ago

sample comment