apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.5k stars 1.29k forks source link

Add transformation function to convert numeric to HLL/Sketch object #10197

Open snleee opened 1 year ago

snleee commented 1 year ago

It would be handy if we have sth like the following:

select toHLL(user_id) as hll_user_id, toThetaSketch(user_id) as theta_user_id from T

hll_user_id | theta_user_id
<hll-object-of-user-id-values> | <theta-sketch-object-user-id-values>

toHLL, toThetaSketch will need some inputs for sketch configurations.

mayankshriv commented 1 year ago

+1

I am hoping this would help us generate HLL / sketches during ingestion (from raw data) and roll up at the same it.

davecromberge commented 1 year ago

+1

Would this apply to both offline and real-time ingestion? With regard to theta sketches, they take an additional parameter that controls the number of retained entries, which ultimately affects both size and accuracy. This might be worth taking into consideration as an argument to the transform function.

It would be nice to have a data type abstraction associated with metrics - in this way users could create additional data types and know exactly what functions are required to support it through the stack.

davecromberge commented 1 year ago

@mayankshriv have you got any links to PRs that have introduced similar features?

What we would like to do is create a sketch metric from string dimensions on ingestion and reduce the number of rows stored by orders of magnitude.