capitalone / DataProfiler

What's in your data? Extract schema, statistics and entities from datasets
https://capitalone.github.io/DataProfiler
Apache License 2.0
1.43k stars 160 forks source link

Replace snappy with cramjam #1091

Closed gliptak closed 7 months ago

gliptak commented 8 months ago

@taylorfturner please assign reviewers

taylorfturner commented 8 months ago

Thanks for your patience @gliptak -- we will review your proposal

gliptak commented 8 months ago

please review

gliptak commented 8 months ago

https://github.com/capitalone/synthetic-data/pull/346

taylorfturner commented 7 months ago

@gliptak rebase onto dev and I'll approve

taylorfturner commented 7 months ago

rebase or update branch on this and I'll approve @gliptak

gliptak commented 7 months ago

green for your review @taylorfturner

gliptak commented 7 months ago

@micdavis @ksneab7 please review/approve

taylorfturner commented 7 months ago

@gliptak are you able to restore your cramjam1 branch?

This change fails tests locally on this command DATAPROFILER_SEED=0 python3 -m unittest discover -p "test*.py"

First issues is dataprofiler/dataprofiler/tests/test_dp_logging.py", line 27, in test_default_verbosity self.assertEqual( AssertionError: 20 != 30

Second issue is dataprofiler/dataprofiler/tests/test_data_profiler.py", line 32, in test_set_seed self.assertEqual(dp.settings._seed, None) AssertionError: 0 != None

main testing works locally and the only diff between main and dev is this PR

We can either:

taylorfturner commented 7 months ago

Checking out f8b3e5dbd4b76f0ecc291911ace9e8e21cf1ecb1 and running tests on that commit (one commit prior to this PR) ensures that tests did run on dev locally at that point in the history @gliptak

taylorfturner commented 7 months ago

Glad to ultimately do cramjam + python 3.11 -- these tests will just need to pass locally as well as remote

gliptak commented 7 months ago

@taylorfturner restored branch

as the build was green for this PR, might that indicate that different library versions where used?

taylorfturner commented 7 months ago

potentially -- I tried a bunch of different scenarios and wasn't able to find a setup where the above two tests passed locally.

Did you run DATAPROFILER_SEED=0 python3 -m unittest discover -p "test*.py" on your end locally with cramjam1?

gliptak commented 7 months ago

I did reproduce the fail locally and still cannot reproduce in GHA

the one difference I observed is that local is Python 3.10.12 while GHA is CPython (3.10.13)