Open fredbi opened 2 years ago
Hello, I am Blathers. I am here to help you get the issue triaged.
Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.
I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.
cc @cockroachdb/bulk-io
Thanks for the detailed report. If you do SET prefer_lookup_joins_for_fks=true;
for all sessions, does the performance improve?
If not, can you provide a statement bundle from one of the upserts by running it with EXPLAIN ANALYZE (DEBUG) UPSERT ...
? It would be ideal if you ran this while your throughput test is running so that it is most representative of the UPSERTs being run by your program.
Note that statement bundles contain some of the data in the table, so please only share if you fine with that. You can upload the bundle here: https://upload.cockroachlabs.com/u/marcus
If you do SET prefer_lookup_joins_for_fks=true; for all sessions ...
Alternatively, you can set the session default for this setting with the cluster setting: SET CLUSTER SETTING sql.defaults.prefer_lookup_joins_for_fks.enabled=true;
. Note that this will only take affect for sessions created after this is run.
Note that statement bundles contain some of the data in the table, so please only share if you fine with that. You can upload the bundle here: https://upload.cockroachlabs.com/u/marcus
No worries, it's mock data created for the benchmark
@mgartner The suggested setting had the desired outcome: FK is checked using a lookup. However, the throughput is actually lower by ~ 15%. Excerpt from the new plan with the session setting (this one was picked on a 1000 values batch):
└── • constraint-check
│
└── • error if rows
│ columns: ()
│ nodes: n1
│ actual row count: 0
│ vectorized batch count: 0
│
└── • lookup join (anti)
│ columns: (column1 varchar)
│ estimated row count: 0 (missing stats)
│ table: poi@poi_pkey
│ equality: (column1) = (id)
│ equality cols are key
│
└── • project
│ columns: (column1 varchar)
│ nodes: n1
│ actual row count: 1,000
│ vectorized batch count: 10
│ estimated row count: 1,000
│
└── • scan buffer
columns: (column1 varchar, column2 varchar, column3 timestamp, column4 float, crdb_internal_poi_id_run_id_timestamp_shard_11_comp int, column4 float, check1 bool)
nodes: n1
actual row count: 1,000
vectorized batch count: 10
label: buffer 1
Now collecting detailed statement analysis....
@mgartner I have uploaded the diagnostic traces for both execution plans (with forced lookup and without). Ref: [RVNW-WEFA]
@mgartner digging the problem on my side with what trace is available, I've noticed 2 things:
Reading the code implementing upsert, I understand that the INSERT ... ON CONFLICT ... behaves in a totally different way, with the CBO making some transform.
I am going to check how this compares to UPSERT. I am going to check how this compares to a setup without FK and the upsert rewritten as something like:
WITH batch AS (
VALUES(),(),(...)),
filtered AS (
SELECT /* opt-in: DISTINCT ON (key) */ key,value FROM batch b JOIN parent_table p ON p.key = b.key
)
UPSERT INTO child_table(...)
SELECT key,value FROM filtered
@fredbi sorry for the delay.
I have a hunch that the only want to improve performance would be something like our "FK insert fast path" which notices FK checks that are simple lookups into indexes and performs them during the execution of the INSERT rather than as a post-query step. In theory, it would be possible to do something similar for some subset of UPSERTs, though we have nothing on our roadmap for this at the moment. You could loosely test whether this might help by adjusting your test to perform INSERTs (and making the changes to avoid inserting conflicts). If you see that this performs markedly better than UPSERTs, then it would support this theory.
We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!
Describe the problem I am using UPSERT on a table with an FK enabled. I am running a test program (go) to benchmark our application.
Performance of the UPSERT is very slow when compared to Postgres on the same environments:
When disabling the FK, the upsert rate jumps to a more satisfactory 24k rows/s (postgres: 60k rows/s).
When analyzing the execution plan captured from the console, I can see that the FK lookup performs a full scan on the remote table (see attached plan).
To Reproduce
Conditions of the UPSERT (checked different scenarios, but the impact of throughput is <10%):
Expected behavior We'd expect the UPSERT to be on a par with Postgres, even in the presence of a FK.
Additional data / screenshots
This table is populated with a handful of records (e.g. < 100).
Environment:
Setup:
Additional context We want to transition from Postgres, but ingestion jobs that are not eligible to the IMPORT strategy all use a similar parallel UPSERT pattern. Having ingestion performance divided by 3 or 4 is not an option.
plan.txt
Jira issue: CRDB-15769