CartoDB / cartodb-postgresql

PostgreSQL extension for CartoDB
BSD 3-Clause "New" or "Revised" License
111 stars 53 forks source link

Cartodbfy check for PK candidates may fail with an error #296

Open rafatower opened 7 years ago

rafatower commented 7 years ago

@rafatower commented on Tue Mar 21 2017

Seen while checking import platform errors:

Exception: PG::Error: ERROR:  CDB(_CDB_Has_Usable_Primary_ID: index row requires 12736 bytes, maximum size is 8191):
ALTER TABLE some_user_table ADD CONSTRAINT cartodb_id_pk PRIMARY KEY (cartodb_id), ADD CONSTRAINT cartodb_id_integer CHECK (cartodb_ [truncated]

stack trace:

app/models/table.rb:1106:in `block (2 levels) in cartodbfy'
app/models/user.rb:580:in `block (2 levels) in transaction_with_timeout'
app/models/user.rb:577:in `block in transaction_with_timeout'
app/models/user.rb:559:in `in_database'
app/models/user.rb:576:in `transaction_with_timeout'
app/models/table.rb:1105:in `block in cartodbfy'
lib/cartodb/stats/aggregator.rb:45:in `block in timing'
lib/cartodb/stats/aggregator.rb:43:in `timing'
app/models/table.rb:1104:in `cartodbfy'
app/models/table.rb:378:in `before_create'
app/models/table/user_table.rb:200:in `before_create'
app/models/table.rb:94:in `save'
app/models/table_registrar.rb:18:in `register'
app/connectors/importer.rb:232:in `persist_metadata'
app/connectors/importer.rb:81:in `register'
app/connectors/importer.rb:55:in `block in run'
app/connectors/importer.rb:54:in `each'
app/connectors/importer.rb:54:in `run'
app/models/data_import.rb:745:in `execute_importer'
app/models/data_import.rb:428:in `dispatch'
app/models/data_import.rb:189:in `run_import!'
lib/resque/importer_jobs.rb:9:in `block in perform'
rafatower commented 7 years ago

Here's where it fails: https://github.com/CartoDB/cartodb-postgresql/blob/795d92da8db9f6fcfebdee649e0f32e9c7588403/scripts-available/CDB_CartodbfyTable.sql#L535

The problem is that under the hood it tries to create a b-tree index to check if a column has unique values and it does so on an arbitrary candidate column. PG is hitting some limitation during b-tree index generation.

To reproduce the issue (using this random_string):

tests=# CREATE TABLE index_generation_test AS SELECT random_string(9000)::text AS candidate_id FROM generate_series(1,5);                                                                                                                                                         
SELECT 5
tests=# ALTER TABLE index_generation_test ADD CONSTRAINT candidate_id_pk PRIMARY KEY (candidate_id);
ERROR:  index row requires 9016 bytes, maximum size is 8191