Open zeotuan opened 2 months ago
Is your feature request related to a problem? Please describe. Currently, KLLSketch and DataType analyzer is implemented use the UserDefinedAggregateFunction
KLLSketch
DataType
https://github.com/awslabs/deequ/blob/3b1a3ec5d1aac8e5e15e694be709530fd343d8a3/src/main/scala/com/amazon/deequ/analyzers/catalyst/StatefulKLLSketch.scala#L29
https://github.com/awslabs/deequ/blob/3b1a3ec5d1aac8e5e15e694be709530fd343d8a3/src/main/scala/com/amazon/deequ/analyzers/catalyst/StatefulDataType.scala#L26
which is considered deprecated and should be replaced with Aggregator which offer much greater performance which was outlined here https://github.com/apache/spark/pull/25024#issue-293548866
Describe the solution you'd like Reimplement StatefulDataType and StatefulKLLSketch using Aggregator
StatefulDataType
StatefulKLLSketch
Aggregator
I am happy to help with this implementation.
Is your feature request related to a problem? Please describe. Currently,
KLLSketch
andDataType
analyzer is implemented use the UserDefinedAggregateFunctionhttps://github.com/awslabs/deequ/blob/3b1a3ec5d1aac8e5e15e694be709530fd343d8a3/src/main/scala/com/amazon/deequ/analyzers/catalyst/StatefulKLLSketch.scala#L29
https://github.com/awslabs/deequ/blob/3b1a3ec5d1aac8e5e15e694be709530fd343d8a3/src/main/scala/com/amazon/deequ/analyzers/catalyst/StatefulDataType.scala#L26
which is considered deprecated and should be replaced with Aggregator which offer much greater performance which was outlined here https://github.com/apache/spark/pull/25024#issue-293548866
Describe the solution you'd like Reimplement
StatefulDataType
andStatefulKLLSketch
usingAggregator
I am happy to help with this implementation.