Closed Dig-Doug closed 2 months ago
I believe this is actually a symptom of incorrectly ordering the primary keys.
This table has a compound primary key:
CREATE TABLE document_snippets(
source_document_id text,
id text,
PRIMARY KEY (source_document_id, id)
)
But in the error, the components are reversed ORDER BY id, source_document_id
which will lead to a very expensive query.
Hi @Dig-Doug, have you tested if removing sorting makes it work for you?
Yes, removing the sort allows the connector to start syncing.
For reference, here is a comparison of the costs for misordering the keys:
EXPLAIN SELECT * FROM document_snippets ORDER BY id, source_document_id LIMIT 1000 OFFSET 2000;
Limit (cost=2440052.09..2440168.76 rows=1000 width=1405)
-> Gather Merge (cost=2439818.74..2679752.32 rows=2056430 width=1405)
Workers Planned: 2
-> Sort (cost=2438818.72..2441389.25 rows=1028215 width=1405)
Sort Key: id, source_document_id
-> Parallel Seq Scan on document_snippets (cost=0.00..448904.15 rows=1028215 width=1405)
EXPLAIN SELECT * FROM document_snippets ORDER BY source_document_id,id LIMIT 1000 OFFSET 2000;
Limit (cost=5294.09..7940.85 rows=1000 width=1405)
-> Index Scan using document_snippets_pkey on document_snippets (cost=0.56..6531468.94 rows=2467717 width=1405)
If you want you can make a contribution to us with the change you've made. If not, we'll address it but probably a bit later.
Would you like to contribute?
Bug Description
The postgres connector fails to sync after processing a few documents in my database.
To Reproduce
Expected behavior
It should not fail. If an individual row cannot be uploaded, it should drop it and keep going.
Environment
Running the connector on docker