Open onderkalaci opened 8 years ago
One more note on this error message.
CREATE TABLE test_1 (key int, dummy int);
CREATE TABLE test_2 (key int, dummy int);
SELECT master_create_distributed_table('test_1', 'key', 'hash');
SELECT master_create_distributed_table('test_2', 'key', 'hash');
SELECT master_create_worker_shards('test_1', 4, 1);
SELECT master_create_worker_shards('test_2', 1, 1);
SELECT * FROM test_1 as a LEFT JOIN test_2 as b ON (a.key = b.key);
Now we don't have 1-1 shard mappings, but the query works. So, the error message Shards of relations in outer join queries must have 1-to-1 shard partitioning
seemed a bit confusing.
Is it intentinal that we do not mention n-to-1 cases in the message?
Is it intentinal that we do not mention n-to-1 cases in the message?
Mostly because there's a lot of subtlety to it. n-to-1 is valid for left joins, but for right joins is 1-to-n, and for full joins 1-to-1 only. Then there's the fact that n-to-1 refers to shard counts and 1-to-1 refers to the number of overlapping shards for each shard. And what if you join 3 tables? The message describes the simple case, but I agree it's not great. Any thoughts on improving it?
@onderkalaci -- could you copy/paste the exact error message that you see?
Is this issue related to the error message in #503?
I'm copy/pasting an error message below that a user shared with us in private to capture more info. @marcocitus / @onderkalaci, how much effort is to make this error message more helpful to the user?
demo=# SELECT COUNT(t.id) FROM "event" "t" LEFT JOIN "sensor" ON sensor.id = t.sensor_id; ERROR: cannot perform distributed planning on this query DETAIL: Shards of relations in outer join queries must have 1-to-1 shard partitioning
This particular error message came up again in a conversation last week. (+1)
@begriffs -- I also overheard that you were asking about the meaning of an error message related to joins. Was it this particular issue or the one mentioned in #577?
This particular error message came up again in another conversation this week. (+1)
Hi - I've distributed two tables with the same method and same shards count, but are partitioned on separate keys. When I do a left join on these tables, I get an error: "Shards of relations in outer join queries must have 1-to-1 shard partitioning" . Can you please explain what am I doing incorrectly?
@onderkalaci / @marcocitus -- We also had a few internal emails where users asked about this error message. Now that we have the notion of colocated tables built into Citus 6.0, is there a way to make this error message more helpful?
@ozgune I think using co-location related keywords would not help user to understand the problem in this case. We allow some outer joins even if tables are not co-located and we do not allow some outer joins even if tables are co-located.
I think Onder's suggestion is pretty good to explain the problem;
cannot run outer joins with re-partition job
It is short and to the point. However the term "re-partition job" may not be meaningful for new users. Saying that, I cannot come up with better alternative.
@samay-sharma Do you have any suggestion?
Hey @byucesoy , there seem to be 2 cases when we display this message.
One case is when the join type chosen is LOCAL_PARTITION_JOIN
and !shardIntervalsMatch
. Do you know what user behaviors / scenarios can lead us to this code path?
For the 2nd one, I think its better to show a different error message.
An error message like:- "cannot run outer join query which requires repartitioning" does seem like an improvement. Can we say something like below (i.e. is this accurate for scenarios when we hit this error)?
ERROR: cannot run outer join query if join is not on partition column
Detail: Outer joins requiring repartitioning are unsupported.
I'm trying to tie the error message to the user behavior, if possible, so that they know what to fix.
Also, I had a question. Is it possible that a user reached this as a result of missing out adding the partition column in the filter / join? If so, do we want to add a hint to add partition column to join clauses etc. like we are doing in #1074 ?
Hey @samay-sharma,
I think, we can use your suggestion for second usage of error message. In that case, error occurs because there is no partition column in the error message. However situation is different for first usage of the error message. I wrote an example;
SET citus.shard_replication_factor TO 1;
CREATE TABLE table1(column1 int);
CREATE TABLE table2(column1 int);
SELECT create_distributed_table('table1', 'column1', 'append');
SELECT create_distributed_table('table2', 'column1', 'append');
COPY table1 FROM STDIN;
1
\.
COPY table2 FROM STDIN;
1
\.
COPY table2 FROM STDIN;
1
\.
SELECT * FROM table1 FULL JOIN table2 ON table1.column1 = table2.column1;
ERROR: cannot perform distributed planning on this query
DETAIL: Shards of relations in outer join queries must have 1-to-1 shard partitioning
As you can see we can reach that path even if we join on partition column. In fact, we can only reach this path if there is a join on partition column. For this one, we can use something like this without mentioning partition column;
ERROR: cannot perform distributed planning on this query
DETAIL: Outer joins requiring repartitioning are unsupported.
There are two instances of this error message. With #1118, we changed one of them. The other one still needs some discussion.
Some queries we do not error out;
Some queries we error out;
CREATE TABLE hash_table(partition_column int);
CREATE TABLE reference_table(random_column int);
SELECT create_distributed_table('hash_table', 'partition_column');
SELECT create_reference_table('reference_table');
-- works
SELECT * FROM hash_table LEFT JOIN reference_table ON hash_table.partition_column = reference_table.random_column;
-- errors out
SELECT * FROM hash_table RIGHT JOIN reference_table ON hash_table.partition_column = reference_table.random_column;
SELECT * FROM hash_table FULL JOIN reference_table ON hash_table.partition_column = reference_table.random_column;
What do we mean by 1:1 matching?
/*
* ShardIntervalsMatch returns true if provided shard interval has one-to-one
* matching. Shards intervals must be not empty, and their intervals musht be in
* ascending order of range min values. Shard interval ranges said to be matched
* only if (1) they have same number of shards, (2) a shard interval on the left
* side overlaps with corresponding shard on the right side, (3) a shard interval
* on the right side does not overlap with any other shard. The function does not
* compare a left shard with every right shard. It compares the left shard with the
* previous and next shards of the corresponding shard to check they to not overlap
* for optimization purposes.
*/
Also please note that we only perform this check if OUTER JOIN is a LOCAL_PARTITION_JOIN. Reference table joins are BROADCAST_JOIN therefore we do not perform this check for them, as a result OUTER JOINs containing reference table do not need to have same number of shards.
Follow the steps below:
I'd expect to see
cannot run outer joins with re-partition jobs
, but the actual message is:Shards of relations in outer join queries must have 1-to-1 shard partitioning
, which is wrong since there is a 1-1 shard partitioning.