citusdata / pg_shard

ATTENTION: pg_shard is superseded by Citus, its more powerful replacement
https://github.com/citusdata/citus
GNU Lesser General Public License v3.0
1.06k stars 63 forks source link

Misleading error for incompletely created tables #38

Closed onderkalaci closed 9 years ago

onderkalaci commented 9 years ago

Before this fix, if you try to execute any query on tables which are distributed with master_create_distributed_table but no shards are created yet for the table (i.e. master_create_worker_shards not called), you get unclear error message. This fix catches that case implicitly and more meaningful message is shown.

fixes #9

Review tasks:

onderkalaci commented 9 years ago

@jasonmp85 I also believe that the error message should be much more clear.

I created this pull request so that we may chat about the solution that I suggest.

Handling #9 on the planner is a reasonable solution, right? Could you please check my comments on the commit?

Another solution that may solve the issue might be to check the length of queryShardList in the planner phase. However, the solution that I sent makes code more readable I think.

jasonmp85 commented 9 years ago

I chatted with @sumedhpathak about this and we thought this pull request could have drastically fewer lines of code if we put the check for an empty shard in an existing function rather than adding new functions.

The natural place would be LoadShardIntervalList, except that master_create_worker_shards has logic that expects it to return NIL (i.e. we can't make it raise an error instead of returning NIL because master_create_worker_shards expects the NIL when checking whether shards already exist for a table).

That leaves DistributedQueryShardList. Can you just change that function to check the return value of LookupShardIntervalList and error if it's NIL (also update the comment on the function to reflect this new behavior)?

That change would be far fewer lines of code (probably no more than five?) and would accomplish the same end result. You should also be sure to add a test for the new behavior (add a query in queries.sql after the distribution call has been made but before the shards have been created).

jasonmp85 commented 9 years ago

Two things:

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.01%) when pulling 6e690cbdae08ca7994ffc5766b6208cc947c3aae on feature-issue#9 into 46bcf2c0b1a5250a286e5dd6b0e9fa42e275725f on develop.

onderkalaci commented 9 years ago

With this implementation, we couldn't specify the name of the relation for which shards are not created in the ereport (Well, sure we can specify but code gets longer and become complicated). Thus, I had to use "the distributed table" phrase instead of its name in the ereport. Also, I think we are sure that there is only a single table, because we error out in the "ErrorIfQueryNotSupported" function if there are more than one relations involved in the query.

jasonmp85 commented 9 years ago

I was envisioning that the check would actually live in DistributedQueryShardList rather than after the call to it. Right now that method always returns a list, which can be the empty list (NIL). I'm proposing we change it to always return a non-empty list or error.

Getting the relation name would also be easy because within that method the distributedTableId is in scope. It would just need a block like this:

List *prunedShardList = PruneShardList(distributedTableId, restrictClauseList, shardIntervalList);

if (prunedShardList == NIL)
{
    char *relname = get_rel_name(distributedTableId);

    ereport(ERROR, (errmsg("... %s ...", relname)));
}

return prunedShardList;

As for the parts of the error message:

My suggestions for the above are based on my reading of the style guide. I think we should avoid using quotes unless they contain dynamic strings that could contain space-separated words. If we know ahead of time that a function name looks like a function name, the quotes are "unnecessary" in the words of the style guide. Additionally, we should say could not when the user can take an action to fix something and cannot if it will remain impossible forever.

jasonmp85 commented 9 years ago

Alright, I'm going to add some checklist items to get this wrapped up. I'm being strict about error messages, but we've been lax about doing them properly, saying we'd clean them up later. So I just want to make sure we are being better about them going forward.

After the checklist items are complete I can probably merge this Thursday night or Friday (US time).

jasonmp85 commented 9 years ago

Added checklist up top! Push up some changes to address those things and we'll have a :shipit:!

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.02%) when pulling ed8f95cd9832582a25df32c7a9a80de2b7168757 on feature-issue#9 into 4583f31d43e510d095f9976db4c586784b6b2474 on develop.

onderkalaci commented 9 years ago

Below are my comments on the changes:

jasonmp85 commented 9 years ago

Two small issues. Fix them and :shipit:. You can merge it yourself with the Merge pull request button or do it in the git CLI.

Before shipping:

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.02%) when pulling acef92ffd6514de7da84afc8ed027c8bf821b53a on feature-issue#9 into 4583f31d43e510d095f9976db4c586784b6b2474 on develop.