apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.36k stars 930 forks source link

[Feature] Add support of cardinality deduplication aggregation function #3717

Closed Aitozi closed 3 months ago

Aitozi commented 3 months ago

Search before asking

Motivation

With the cardinality aggregation function, users can calculate the UV based on various flexible groupings. We could support these functions as below.

Solution

No response

Anything else?

No response

Are you willing to submit a PR?

Aitozi commented 3 months ago

CC @zhoulii

Aitozi commented 3 months ago

Currently, there is no widely used library for HyperLogLog, which could pose compatibility issues. It is important that the data from the compute engine aligns with the storage system. As a result, progress on implementing HyperLogLog has been delayed.