citusdata / pg_shard

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement
https://github.com/citusdata/citus
GNU Lesser General Public License v3.0
1.06k stars 63 forks source link

Fix multi-shard SELECT to pull only needed columns #54

Closed jasonmp85 closed 9 years ago

jasonmp85 commented 9 years ago

Reworks multi-shard SELECT so instead of performing quals both remotely and on the local sequential scan, they are only performed remotely. This frees us up to avoid pulling columns mentioned only in quals over the wire, which provides users more control over how much data gets pulled in a multi-shard query.

fixes #33

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.04%) when pulling 510ed3c9754b4ed721631fe56519abf578363dc0 on feature-reduce_column_transfer-#33 into a9564eb9c197c72883b6ed52271d071ff749ba31 on develop.

sumedhpathak commented 9 years ago

We discussed about modifying the code to make it closer to postgres_fdw, which means we break up quals into 'local' and 'remote'. The remote query would then pull columns for the local quals, and execute the remote quals entirely. This makes the logic cleaner, and sets it up for handling volatile/mutable functions. To begin with, we expect the local quals to be NIL as we push all of them down currently.

We also discussed using a mutator for modifying the FromExpr for the local quals, vs setting the quals to NULL. We decided on taking the simpler approach now, acknowledging that this may need to be revisited with subselect handling.

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.03%) when pulling af69860dc96665661b3ba0e9b21e307e5ab63d7a on feature-reduce_column_transfer-#33 into 304cd8f47b3ca85d4aa313b1be5c0ee8e273de09 on develop.

jasonmp85 commented 9 years ago

I haven’t written the ClassifyRestrictions function yet, though at the moment it will trivially put all quals in the remote list and leave the local list empty. So all the functionality is there to use these variables (even if the local quals eventually have something to do), but that function is absent (as is documentation).

I didn’t want to change PlanSequentialScan’s parameter list as they’re analogous to the ordering of params in other plan* functions in PostgreSQL.

Finally, I want to point out that our old code had weird requirements around what received the original query and what received distributedQuery, and in fact certain lists and objects build using distributedQuery were later used with query to build the sequential scan, which felt weird. I retained all those requirements, as they're actually necessary, but it bears calling out that distributedQuery has had mutations performed on it by standard_planner which means the final localQuery is a bit of an amalgam of computed and raw things. Again, this is preserving existing behavior of pg_shard but I wanted to mention it.

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.05%) when pulling 116da133816c32f5fb8bb51a145b22ce92d5e534 on feature-reduce_column_transfer-#33 into c56862e84d055d8c7d4f19c74c1c7dc9664735b6 on develop.

jasonmp85 commented 9 years ago

Responded to these pieces of feedback:

Needing further feedback:

jasonmp85 commented 9 years ago

(And as mentioned, I'll modify function declaration/definition order as a last step).

sumedhpathak commented 9 years ago

I think restriction is OK for now. We already have a function called QueryRestrictList, and it has precedence in Postgres. I think we've used whereClause or selectClause in Citus previously.

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.05%) when pulling 0e7ae296ed1f42f80e3baa7fd2ee70fe29ba4a1b on feature-reduce_column_transfer-#33 into 16079449a57a4797624cabb60fe0704004ae29bd on develop.

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.05%) to 93.13% when pulling 27af8ee237121487ca76c122d0a3fcc0ff244cf3 on feature-reduce_column_transfer-#33 into 16079449a57a4797624cabb60fe0704004ae29bd on develop.

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.05%) to 93.13% when pulling 27af8ee237121487ca76c122d0a3fcc0ff244cf3 on feature-reduce_column_transfer-#33 into 16079449a57a4797624cabb60fe0704004ae29bd on develop.