citusdata / pg_shard

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement
https://github.com/citusdata/citus
GNU Lesser General Public License v3.0
1.06k stars 63 forks source link

Throw error for zero-shard INSERTs #107

Closed jasonmp85 closed 9 years ago

jasonmp85 commented 9 years ago

Found this when doing CitusDB integration. If CitusDB has created an e.g. range partition table I found it "just works" with pg_shard save one caveat: if no shard exists to receive an incoming row, our zero-shard SELECT logic kicked in and pg_shard would happily put the row into the local relation.

UPDATE and DELETE are safe to run locally as the local table should be empty to begin with, but INSERT is not: we should raise an error rather than continue with a zero-shard INSERT.

onderkalaci commented 9 years ago

Hey @jasonmp85,

Changes good looks, but, I have two questions before merging it (mostly for me to understand pg_shard better):

jasonmp85 commented 9 years ago

Before CitusDB integration changes, when was it possible to hit zero shard query code?

A query might contain conflicting conditionals such as x > 1 AND x < 1. In this case, the constraint exclusion will determine that no shards are involved in the query and prune the list to the empty list.

For SELECT, UPDATE, and DELETE, we determined it was fine to hit the local (empty) table representing the distributed table: the query would have no effect. They'd even give the right result: no rows, or zero rows updated or deleted.

For INSERT, we thought that zero-shard logic made no sense: zero shard INSERTs could not happen when pg_shard always created shards to cover the whole hash space. But when testing against CitusDB, I realized that a range-partitioned table might lack a destination for a given tuple, in which case the tuple would be destined for the local table. This error catches that and prevents it.

Even with CitusDB integration, why we need to call next executor ? Why not just error out saying - no shards found (even maybe in the planner)?

If you ran SELECT * FROM table WHERE flag = true AND flag = false;, what would you expect: an error, or an empty result?

Just because no shards can be found does not mean the query should error: if no shards should be queried or modified, an empty result makes sense. Similarly, a DELETE FROM table WHERE id = 5 AND id = 6 should return "0 rows deleted", not an error.

The INSERT is only problematic because it might arise in range partitioning where no shard exists to receive a tuple. In that case, we have not accepted the tuple, so we must raise an error to signify that no shard is present that covers the incoming tuple's partition value.