awslabs / deequ

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Apache License 2.0
3.32k stars 539 forks source link

Add population stability index (PSI) to distance methods #480

Closed bevhanno closed 1 year ago

bevhanno commented 1 year ago

Adds the population stability index (PSI) to distance methods for numerical features.

Population stability index (PSI) is a statistical measure with a basis in information theory that quantifies the difference between one probability distribution from a reference probability distribution.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mentekid commented 1 year ago

Thanks for this, we're taking a look