Closed saicitus closed 8 years ago
We made some late changes in outer joins to translate anti-joins back to left joins, but for hash-distributed tables PruneShardList
now wrongly interprets r.val IS NULL
as a column filter and prunes all the shards except the one which contains hash(NULL)
. A quick workaround is to use IS NULL
on a different column.
A possible fix would be to disregard <partition key> IS NULL
clauses during shard pruning.
The reason that this is not caught in the regression tests, which include anti-joins with correct results, is that shard pruning appears to work differently for hash-distributed and append-distributed tables regarding IS NULL
clauses an the regression tests use append-distributed tables.
Assigning to @jasonmp85
OK, so we've been doing some forensics to figure out (a) why NULL
is mapped to partition zero, (b) what code paths are affected, and (c) whether this is a "feature" we actually need.
@lithp started out the conversation with:
Making it consistent by not transforming
IS NULL
to= 0
sounds sensible. Maybe we could also require that the partition column beNOT NULL
so the requirement we're enforcing surfaces to users sooner.
I agreed, but had some vague memory of this being an intentional change. @lithp thought this emanated from the original pg_shard
codebase, but hit a dead end at the point of the initial pg_shard
open-source import from our closed-source repo.
After poking around historic copies of various repos, I think the chronology was:
pg_shard
's shard pruning logic, but we had changes from Citus proper we wanted to ensure would be 100% compatiblepg_shard
, before it ever actually made it into CitusThe net result of the above is that it looks like this hit Citus after pg_shard
, even though it originally came from a code review opened against Citus.
After remembering those events, I think this change was introduced to ensure the hash partition pruning algorithm remained identical to existing behavior in hash repartition JOIN
s, which return 0
for NULL
columns. I also (erroneously) believed that NULL
values could be INSERT
ed to distributed tables, something @marcocitus had already pointed out as false.
@metdos chimed in with:
As Jason noted, I wrote shard pruning on the hash-partitioned tables. At that time, we decided to map
NULL
values to0
. I don't remember any discussion about not allowingNULL
values at that time.
This morning, @marcocitus reiterated that NULL
values were never permitted by our INSERT
pipeline:
We already prevent inserting NULL values for the partition column into hash-distributed tables.
postgres=# INSERT INTO test VALUES (NULL); ERROR: cannot plan INSERT using row with NULL value in partition column
I still need to verify that e.g. copy also rejects NULL
partition values, but if it does, this removes the ingest objection to just removing @lithp's original suggestion.
With that reminder, I still thought the repartitioning logic might use the codepath in question. Turns hash repartition joins (worker_hash_partition_table
) just calls the hash function for the column in question, using %
to "bucket" the values into hash bins. This is substantially different than the old pg_shard
code, as referenced in #129 (which @sumedhpathak pointed out this morning).
So (assuming the copy path also forbids NULL
values), all the obstacles to the "simple" approach have been removed.
For future reference, the alternative way of addressing this bug would be to not push down IS NULL
constraints during outer joins (instead applying them on the master during the final pass). I believe something could be modified during this section to accomplish this. Then we'd be able to preserve NULL
ability of partition columns, if we desired.
But since that's already not permitted, it seems like no problem to go with @lithp's original suggestion.
OK, copy also forbids NULL
in the partition column:
ERROR: 22004: cannot copy row with NULL value in partition column
CONTEXT: COPY test, line 1: "NULL"
Steps
Expected
(2 rows)
Actual
(5 rows)