citusdata / pg_shard

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement
https://github.com/citusdata/citus
GNU Lesser General Public License v3.0
1.06k stars 63 forks source link

Push simple aggregates down #113

Closed onderkalaci closed 9 years ago

onderkalaci commented 9 years ago

This PR is intended for the following : Users should be able to exploit computational power of worker nodes in servicing aggregates. At minimum, if a GROUP BY is present which references the partition column, we should be able to create a query containing certain safe aggregates.

jasonmp85 commented 9 years ago

I added my first round of questions and comments here. I also have some high-level questions:

Let's start some conversations around these questions (and maybe you can start investigating cleaning up the boolean flag issue) and I'll add more comments in the next round.

onderkalaci commented 9 years ago

@jasonmp85

onderkalaci commented 9 years ago

@jasonmp85 I did some changes on the code flow. In the beginning, I wrote a function whose name is ClassifyTargetList. That function, similar toClassifyRestrictions, could classify the target lists. But, then for other parts of the code (ie: create temp table or updating groupClause of queries), I had to understand whether aggregates are pushed down or not from the target lists. That seemed as weird as previous PR. (Btw, what do you think for that approach? I can switch to it back)

Thus, in the updated PR, I wrote ClassifyHasAggregates functions, then, depending on that values let the code flow accordingly. So, this time, the flag does not directly shows whether we pushed down the aggregates or not. They simple show the value of hasAgg field of distributed and local queries.

I think this time passing those values could not be considered as anti-pattern. What do you think?

Only CreateTemporaryTableStmt seems to be have an anti-pattern, but, Code complete 2 book says that if a function gets a boolean and immediately returns according to that value could not be considered as anti-pattern. So, I think CreateTemporaryTableStmt seems OK from that perspective.

Also, I remembered that I also took CreateStatement directly from CitusDB.

jasonmp85 commented 9 years ago

I think in addition to the two you mentioned, all of these also come from CitusDB, no?

Tell me whether these have any important changes, because from what I can see they're mostly similar to the CitusDB code and so I mostly need to look at how they're called, not what they contain.

onderkalaci commented 9 years ago

Hey @jasonmp85,

I tried to address your feedback on the changes.

My main concern whether I fell into the same situation we had previously. Could you please consider the changes with that perspective too? This time we do not pass the flag to many functions, but, we still we pass the flag to some functions. Below is your previous comments:

The first function I came across with the safeToPushDownAggregates flag made me think of the rule that boolean flags on methods is an antipattern. And so I was already thinking this when I came across many more methods that do the same thing. Is there any thing you can do to get rid of the boolean flag that we're pushing all the way down the stack? Passing in a boolean flag is basically the caller saying "I know you have a conditional somewhere in your implementation, please take this branch". Might as well hoist up the conditional and have two methods, where appropriate.

Also, I realized that the comment on targetList and/or its name should be changed. What do you think? Previously, the targetList of local/target queries were the same. But, it is not anymore the case. For example, should we change it to localTargetList? Well, it is not localTargetList as well. It basically corresponds to the temp table's schema.