citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.35k stars 657 forks source link

Misleading error message when shard creation fails in create_distributed_table #1510

Open marcocitus opened 7 years ago

marcocitus commented 7 years ago

When something goes wrong during shard creation in create_distributed_table(...) we always give a message like ERROR: table "new_distributed_table" could not be colocated with search, even when the problem is unrelated to colocation.

Example posted on Slack:

postgres=# \d search
                  Table "public.search"
    Column     |            Type             | Modifiers 
---------------+-----------------------------+-----------
 id            | text                        | not null
 timestamp     | timestamp without time zone | 
 query_channel | text                        | 
 _extra_props  | jsonb                       | 
Indexes:
    "idxfinished" btree (((_extra_props -> 'params'::text) ->> 'channelID'::text))

postgres=# \d compressed
                   Foreign table "public.compressed"
    Column     |            Type             | Modifiers | FDW Options 
---------------+-----------------------------+-----------+-------------
 id            | text                        | not null  | 
 timestamp     | timestamp without time zone |           | 
 query_channel | text                        |           | 
 _extra_props  | jsonb                       |           | 
Server: cstore_server
FDW Options: (compression 'pglz')

postgres=# SELECT create_distributed_table('compressed', 'query_channel');
WARNING:  could not open extension control file "/usr/share/postgresql/9.6/extension/cstore_fdw.control": No such file or directory
ERROR:  table "compressed" could not be colocated with search

In this case, the problem was that cstore_fdw is not installed on the worker, but the error made it seem like a distributed cstore_fdw table could not be colocated with a regular distributed table.

anarazel commented 7 years ago

On 2017-07-24 09:55:37 +0000, Marco Slot wrote:

When something goes wrong during shard creation in create_distributed_table(...) we always give a message like ERROR: table "new_distributed_table" could not be colocated with search, even when the problem is unrelated to colocation.

postgres=# SELECT create_distributed_table('compressed', 'query_channel'); WARNING: could not open extension control file "/usr/share/postgresql/9.6/extension/cstore_fdw.control": No such file or directory ERROR: table "compressed" could not be colocated with search



In this case, the problem was that cstore_fdw is not installed on the worker, but the error made it seem like a distributed `cstore_fdw` table could not be colocated with a regular distributed table.

We've a fair number of cases where we report worker errors as WARNING, just to afterwards ERROR out. There's a few cases where that's good-ish (e.g. selecting from RF > 1 tables), but in most cases it's bad because it's confusing as here, and because sometimes a lot of unnecessary work for other shareds will be finished. A number of those are because we used to have no error handling infrastructure, but ... I think we should replace most of them w/ error context usage, and straight out errors.

jasonmp85 commented 7 years ago

As a related note table "compressed" could not be colocated with search is a very bad error message even when it's "correct". What is search and why do I want my table colocated with it?

jasonmp85 commented 7 years ago

Oh, I see now in your example that search is a table name. But shouldn't it be in quotes like the first table name? Having one in quotes and one out of it is confusing in its own right.

sumedhpathak commented 7 years ago

@jasonmp85 You are looking into this issue?

jasonmp85 commented 7 years ago

@sumedhpathak yes.

jasonmp85 commented 7 years ago

Oh, it didn't get assigned to me? Assigning.