apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.5k stars 3.7k forks source link

Min() and Max() aggregate functions on string columns #16956

Open johnImply opened 2 months ago

johnImply commented 2 months ago

Description

Provide min() and max() aggregrate functions on string expressions

This is related to https://github.com/apache/druid/issues/11659 but here I am calling out explicit behavior desired.

Motivation

Many database products extend min() and max() aggregate functions to string datatypes, primary based on a lexical sort.

Assumed this would follow the same sorting mechanism as provided by the ORDER BY clause.

GWphua commented 2 months ago

Hello, I plan to give a shot at this.

I am referencing the other MinAggregator classes, and am now figuring out what the BufferAggregator and AggregatorFactory are for. While I am confident that I will be able to get somewhere in this, any tips from any experienced developers will be welcome, and will definitely speed up my learning process. 😄

abhishekagarwal87 commented 2 months ago

You can refer to the javadocs of these interfaces to get an idea behind these classes. AggregatorFactory is what you use to get a off-heap aggregator or an on-heap aggregator or a vector aggregator (off-heap). Depending on the use case (ingestion vs query), query engine (topn, grouping), one of on-heap or off-heap aggregator can be used.

When you are building a new kind of aggregation, you will be making all these flavours available so that your aggregation can work in various different settings.