awslabs / python-deequ

Python API for Deequ
Apache License 2.0
713 stars 134 forks source link

Better handling for categoricalSorter in CategoricalRangeRule and FractionalRangeRule #107

Open lecardozo opened 2 years ago

lecardozo commented 2 years ago

Is your feature request related to a problem? Please describe. In order to fix #70 and enable support for Deequ 2.x on #100, we performed a workaround (forcing the usage of a reference to the default categorySorter), instead of properly supporting this feature and exposing this argument to the Python API.

Describe the solution you'd like Ideally, we wouldn't need to force the usage of the default categorySorter and expose this argument in the Python API to let users choose what they want.

Describe alternatives you've considered Maybe we could implement an interface — similar to what we already do with ScalaFunction2 — that maps a Python function to the Scala counterpart. That said, I'm not sure how the handling of non-primitive types (such as DistributionValue) should happen on the Python side. Ideas are welcome here 😃