awslabs / python-deequ

Python API for Deequ
Apache License 2.0
713 stars 134 forks source link

Data validity metrics #206

Open vandanavk opened 4 months ago

vandanavk commented 4 months ago

Is your feature request related to a problem? Please describe. With the current list of analyzers, we don't have a way to check data validity - presence of nulls and zeroes in the data.

Describe the solution you'd like We would like to be able to determine the percentage of rows that have null value or zero value for a particular column

Describe alternatives you've considered We will probably have to implement this in python in our own fork but would be great to have this capability in deequ (Scala)

Additional context Similar to Tecton's data quality metrics on resultant feature values