Closed doenis-usk closed 2 months ago
This issue can be reproduced with the following modification to the test case: (I used approxHistogram for an example, but any complex aggregator will do)
diff --git a/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java b/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
index c38b3350a1..058c4c7388 100644
--- a/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
+++ b/extensions-core/histogram/src/test/java/org/apache/druid/query/aggregation/histogram/ApproximateHistogramAggregationTest.java
@@ -139,10 +139,17 @@ public class ApproximateHistogramAggregationTest extends InitializedNullHandling
String ingestionAgg = ignoreNulls ? "approxHistogramFold" : "approxHistogram";
String metricSpec = "[{"
- + "\"type\": \"" + ingestionAgg + "\","
- + "\"name\": \"index_ah\","
- + "\"fieldName\": \"index\""
- + "}]";
+ + "\"type\":\"filtered\","
+ + "\"filter\": {"
+ + " \"type\": \"regex\","
+ + " \"dimension\":\"market\","
+ + " \"pattern\":\".+\""
+ + "},"
+ + "\"aggregator\":{"
+ + " \"type\": \"" + ingestionAgg + "\","
+ + " \"name\": \"index_ah\","
+ + " \"fieldName\": \"index\""
+ + "}}]";
String parseSpec = "{"
+ "\"type\" : \"string\","
Note that the filtered aggregator matches all values, so the test result should not be affected by it. Currently, it filters out all the rows.
After some investigation, I found the modification above did not affect the test result before https://github.com/apache/druid/pull/9484 was introduced.
ValueMatcher returned in FilteredAggregatorFactory#factorize
was made by IncrementalIndexInputRowColumnSelectorFactory#makeDimensionSelector
before, now it is made by IncrementalIndexInputRowColumnSelectorFactory#makeColumnValueSelector
.
In this case, IncrementalIndexInputRowColumnSelectorFactory#makeDimensionSelector
infers the columnValueSelector to be made is complex since underlying aggregator has complex type, so it wraps the selector to use ComplexMetricSerde.
https://github.com/apache/druid/blob/e5f0da30ae15369f66c7e9ecc05a41c3d49eb2e6/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java#L133
This is a bit strange, as the selected column is for filtering unlike normal aggregation, but it successfully handled complex types and applied filters before.
https://github.com/apache/druid/pull/9484/files#diff-310a154902921cae9b118d769560f297L123-L130
https://github.com/apache/druid/blob/c6c2282b59cda107089a9b3944477fd630bc0657/processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java#L215
Now, if the selector is complex, it is considered unfilterable and it never matches values. https://github.com/apache/druid/blob/c557a1448d872e3aab03aec3bafb035e74583656/processing/src/main/java/org/apache/druid/segment/filter/PredicateValueMatcherFactory.java#L89
An easy fix will be to add some condition to judge whether a columnValueSelector should be complex when referenced column is for filtering and not for delegated complex aggregation of filtered aggregator:
diff --git a/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java b/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
index 8778700a7e..37c43cd0ce 100644
--- a/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
+++ b/processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndex.java
@@ -131,7 +130,7 @@ public abstract class IncrementalIndex<AggregatorType> extends AbstractIndex imp
@Override
public ColumnValueSelector<?> makeColumnValueSelector(final String column)
{
- final boolean isComplexMetric = ValueType.COMPLEX.equals(agg.getType());
+ final boolean isComplexMetric = agg.requiredFields().contains(column) && ValueType.COMPLEX.equals(agg.getType());
I can do a PR, but more appropriate solutions are welcome.
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.
Is this fixed in any latest release?
This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.
This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.
affected version
0.18.1
Description
I use extension metrics(DataSketches Quantiles Sketch or Approximate Histogram) with Filtered Aggregator at ingestion time. When I post a query, I got only
NaN
result.However, when I use longSum or count metrics with Filtered Aggregator at ingestion time, I got the numeric result.
ingestion spec
For example, I ingest two metrics (1)latency_histogram, (2)latency_histogram_without_filter and status 200 data exists.
query
When I post below query, latency_histogram is NaN and latency_histogram_without_filter is numeric result.