capitalone / DataProfiler

What's in your data? Extract schema, statistics and entities from datasets
https://capitalone.github.io/DataProfiler
Apache License 2.0
1.42k stars 158 forks source link

feat: compute data type profile diff #1074

Closed scottiegarcia closed 9 months ago

scottiegarcia commented 9 months ago

Details

Profile differences weren't being computed for data_type_stats profiles. I discovered this because psi was only being computed for the categorical profiler. Simply am adding in a line to compute that.

Additionally, dataprofiler/profilers/text_column_profile.py was using the numeric stats mix in to compute diff simply to double check that the data type was the same. I've updated that to use the BaseColumnProfiler diff instead.

CLAassistant commented 9 months ago

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


AXY161 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

taylorfturner commented 9 months ago

@scottiegarcia tests failing on diff()