apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.5k stars 3.7k forks source link

feature: stringMax, stringMin aggregations #16331

Open ColeAtCharter opened 6 months ago

ColeAtCharter commented 6 months ago

Description

The stringMin/stringMax aggregation should return a string which can be filtered on without finalization, unlike a complex aggregation like stringFirst/StringLast. It should be usable at ingest and at query time. Eventually, the aggregator could include a configuration property for the collation to be used for the comparison.

This implies the ability to have string metric columns which could eventually allow users to create custom aggregations which can be directly consumed without finalization, without special serializing/deserializing, and without writing custom logic (eg, extensions)

LakshSingla commented 3 months ago

The stringMin/stringMax aggregation should return a string which can be filtered on without finalization, unlike a complex aggregation like stringFirst/StringLast. It should be usable at ingest and at query time. Eventually, the aggregator could include a configuration property for the collation to be used for the comparison.

What's the use case of such an aggregation? Intermediate results are important for merging along the way, so I am unsure if I am interpreting the ask correctly. Can you share the pain point you are facing with the way Druid is right now? That would aid my understanding.

directly consumed without finalization

At this point, intermediate results/finalization is a necessary evil - we require it to merge the intermediate results, and finalizing makes sure that the users have as little knowledge about it.

without special serializing/deserializing

What does this mean?

and without writing custom logic (eg, extensions)

What is meant by a "custom logic"?