apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.5k stars 3.7k forks source link

postAggregations are broken in subqueries #472

Closed yuvaloren closed 10 years ago

yuvaloren commented 10 years ago

In groupBy queries containing a subquery (dataSource type=query), postAggregations in the subquery don't work.

fjy commented 10 years ago

Hi Yuval, we've done some work to optimize how post aggregations are calculated with recent code. Do you know if this is still an issue in the latest and if so, do you know the cause?

yuvaloren commented 10 years ago

This still appears to be an issue in today's master. I think the problem is around line 108 of GroupByQueryQueryToolChest.java. The IncrementalIndexStorageAdapter that's made out of the subquery results doesn't include postAggregations but should, so that the outer query can access them. I'm not sure how to fix it.

will-lauer commented 10 years ago

I needed this to work, so did a little investigating. It looks like there are two issues going on. First, in GroupByQueryQueryToolChest.mergeGroupByResult(), the call to makeIncrementalIndex when constructing the IncrementalIndexStorageAdapter is wrong. Instead of passing in the subquery, the query needs to be passed instead. This will include the necessary post-aggregator values correctly. The other problem is that mapping of fields between the inner and outer query is done using aggregators retrieved from AggregatorFactory.getCombiningFactory(), which due to its other uses, assumes that the name of the inner queries fields (aggregators or post-aggs) are the same as the outer query's aggregators that reference them. Effectively, "fieldName" in the outer queries aggregators will always be ignored and "name" used instead to find the underlying field.

will-lauer commented 10 years ago

I’m still waiting on our legal department to sign off on the CLA, so I’m unable to post a pull request at this time, but you can go ahead a send a fix.

When you are putting together the pull request, double check the unit tests. I found there was one other change I needed to make to suppress an exception that occurred when indexing data. It was having a problem around adding data when the inner query included a wider data range than the outer query. After looking at it, it appeared that this was actually a valid case, so I just commented out the exception.

Will

Will Lauer Tech Yahoo, Software Sys Dev Eng, Sr P: 217.255.4262 M: 508.561.6427 2021 S First St Suite 110 Champaign IL 61820

[http://forgood.zenfs.com/logos/yahoo.png]

From: Yuval Oren [mailto:notifications@github.com] Sent: Monday, June 09, 2014 10:48 AM To: metamx/druid Cc: Will Lauer Subject: Re: [druid] postAggregations are broken in subqueries (#472)

Thanks! Do you already have a pull request, or should I send a fix?

— Reply to this email directly or view it on GitHubhttps://github.com/metamx/druid/issues/472#issuecomment-45506226.

yuvaloren commented 10 years ago

Looking again, I think we do want to merge the subquery and apply the post-aggregations to the subquery before applying the outer query. In any case, I'll write up a test and try what you're suggesting.

will-lauer commented 10 years ago

It looked like the post-aggregations were working correctly against the subquery once I had made my change. When an aggregation in the outer query needed a post-aggregation, the value appeared to be correctly present.

Will

Will Lauer Tech Yahoo, Software Sys Dev Eng, Sr P: 217.255.4262 M: 508.561.6427 2021 S First St Suite 110 Champaign IL 61820

[http://forgood.zenfs.com/logos/yahoo.png]

From: Yuval Oren [mailto:notifications@github.com] Sent: Monday, June 09, 2014 10:56 AM To: metamx/druid Cc: Will Lauer Subject: Re: [druid] postAggregations are broken in subqueries (#472)

Looking again, I think we do want to merge the subquery and apply the post-aggregations to the subquery before applying the outer query. In any case, I'll write up a test and try what you're suggesting.

— Reply to this email directly or view it on GitHubhttps://github.com/metamx/druid/issues/472#issuecomment-45507292.

fjy commented 10 years ago

Hi @will-lauer, we already have a CLA from your company. We received it awhile ago. You may want to check internally about the details.

fjy commented 10 years ago

Hey guys, sorry I'm late to the party. I took a look into this problem as well. I think https://github.com/metamx/druid/pull/593 will address the problem but more unit test will def help to make sure.

xvrl commented 10 years ago

593 should have fixed this