UPSERT with FK enabled - Githubissues

fredbi commented 2 years ago

Describe the problem I am using UPSERT on a table with an FK enabled. I am running a test program (go) to benchmark our application.

Performance of the UPSERT is very slow when compared to Postgres on the same environments:

cockroach: ~ 800 rows/s
Postgres: ~ 3400 rows/s

When disabling the FK, the upsert rate jumps to a more satisfactory 24k rows/s (postgres: 60k rows/s).

When analyzing the execution plan captured from the console, I can see that the FK lookup performs a full scan on the remote table (see attached plan).

To Reproduce

Conditions of the UPSERT (checked different scenarios, but the impact of throughput is <10%):

use UPSERT, not INSERT ON CONFLICT
use batched inserts (of 200, 500 or 1000 values)
commit window set to 1 or 10 batches
use HASH PK with 11 buckets
10 or 20 parallel goroutines carry out the UPSERT

forced ANALYZE of the involved tables after populating the test scenario and before running the test ` SQL statement used:

UPSERT
INTO
poi_timestamps (poi_id, run_id, "timestamp", depth)
VALUES
($1:::VARCHAR, $2:::VARCHAR, $3:::TIMESTAMP, $4:::FLOAT8),
($5:::VARCHAR, $6:::VARCHAR, $7:::TIMESTAMP, $8:::FLOAT8),
($9:::VARCHAR, $10:::VARCHAR, $11:::TIMESTAMP, $12:::FLOAT8),
...

Expected behavior We'd expect the UPSERT to be on a par with Postgres, even in the presence of a FK.

Additional data / screenshots

CREATE TABLE poi (
    id character varying NOT NULL PRIMARY KEY,
    location geometry(POINT,4326),
    name text,
    category character varying DEFAULT 'OVERTOPPING'::character varying NOT NULL,
    thresholds jsonb,
    labels jsonb
);
CREATE INDEX poi_geom_idx ON poi USING gist (location);

This table is populated with a handful of records (e.g. < 100).

CREATE TABLE poi_timestamps (
    -- with FK: 
    poi_id character varying NOT NULL REFERENCES poi(id),
    -- without FK: poi_id character varying NOT NULL,
    run_id character varying NOT NULL,
    "timestamp" timestamp without time zone NOT NULL,
    depth double precision,  
    -- pg: PRIMARY KEY (poi_id, run_id, "timestamp")
    -- cockroachdb:
    PRIMARY KEY (poi_id, run_id, "timestamp") USING HASH WITH BUCKET_COUNT = 11
);

Environment:

Setup:

single node
ubuntu 20

select version();
                                           version
---------------------------------------------------------------------------------------------
  CockroachDB CCL v22.1.0-beta.3 (x86_64-pc-linux-gnu, built 2022/04/18 16:18:27, go1.17.6)

CockroachDB version: v22.1-beta.3
Server OS: ubuntu 5.13.0-39-generic #44~20.04.1-Ubuntu SMP
Client app: go 1.18 program, using pgx v4 drive

Additional context We want to transition from Postgres, but ingestion jobs that are not eligible to the IMPORT strategy all use a similar parallel UPSERT pattern. Having ingestion performance divided by 3 or 4 is not an option.

plan.txt

Jira issue: CRDB-15769

blathers-crl[bot] commented 2 years ago

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

@cockroachdb/sql-experience (found keywords: SQL statement,pgx)
@cockroachdb/sql-queries (found keywords: plan)
@cockroachdb/bulk-io (found keywords: IMPORT)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

blathers-crl[bot] commented 2 years ago

cc @cockroachdb/bulk-io

mgartner commented 2 years ago

Thanks for the detailed report. If you do SET prefer_lookup_joins_for_fks=true; for all sessions, does the performance improve?

If not, can you provide a statement bundle from one of the upserts by running it with EXPLAIN ANALYZE (DEBUG) UPSERT ...? It would be ideal if you ran this while your throughput test is running so that it is most representative of the UPSERTs being run by your program.

Note that statement bundles contain some of the data in the table, so please only share if you fine with that. You can upload the bundle here: https://upload.cockroachlabs.com/u/marcus

mgartner commented 2 years ago

If you do SET prefer_lookup_joins_for_fks=true; for all sessions ...

Alternatively, you can set the session default for this setting with the cluster setting: SET CLUSTER SETTING sql.defaults.prefer_lookup_joins_for_fks.enabled=true;. Note that this will only take affect for sessions created after this is run.

fredbi commented 2 years ago

Note that statement bundles contain some of the data in the table, so please only share if you fine with that. You can upload the bundle here: https://upload.cockroachlabs.com/u/marcus

No worries, it's mock data created for the benchmark

fredbi commented 2 years ago

@mgartner The suggested setting had the desired outcome: FK is checked using a lookup. However, the throughput is actually lower by ~ 15%. Excerpt from the new plan with the session setting (this one was picked on a 1000 values batch):

└── • constraint-check
    │
    └── • error if rows
        │ columns: ()
        │ nodes: n1
        │ actual row count: 0
        │ vectorized batch count: 0
        │
        └── • lookup join (anti)
            │ columns: (column1 varchar)
            │ estimated row count: 0 (missing stats)
            │ table: poi@poi_pkey
            │ equality: (column1) = (id)
            │ equality cols are key
            │
            └── • project
                │ columns: (column1 varchar)
                │ nodes: n1
                │ actual row count: 1,000
                │ vectorized batch count: 10
                │ estimated row count: 1,000
                │
                └── • scan buffer
                      columns: (column1 varchar, column2 varchar, column3 timestamp, column4 float, crdb_internal_poi_id_run_id_timestamp_shard_11_comp int, column4 float, check1 bool)
                      nodes: n1
                      actual row count: 1,000
                      vectorized batch count: 10
                      label: buffer 1

Now collecting detailed statement analysis....

fredbi commented 2 years ago

@mgartner I have uploaded the diagnostic traces for both execution plans (with forced lookup and without). Ref: [RVNW-WEFA]

fredbi commented 2 years ago

@mgartner digging the problem on my side with what trace is available, I've noticed 2 things:

Traces are incomplete because my setup does not use root. I see you have noticed that too and issued https://github.com/cockroachdb/cockroach/pull/80405
The FK check predates >90% of the total reported cost of the statement - this is confirmed by a statement running > 50x faster without the FK

Reading the code implementing upsert, I understand that the INSERT ... ON CONFLICT ... behaves in a totally different way, with the CBO making some transform.

I am going to check how this compares to UPSERT. I am going to check how this compares to a setup without FK and the upsert rewritten as something like:

WITH batch AS (
VALUES(),(),(...)),
filtered AS (
  SELECT /* opt-in: DISTINCT ON (key) */ key,value FROM batch b JOIN parent_table p ON p.key = b.key
)
UPSERT INTO child_table(...) 
SELECT key,value FROM filtered

mgartner commented 2 years ago

@fredbi sorry for the delay.

I have a hunch that the only want to improve performance would be something like our "FK insert fast path" which notices FK checks that are simple lookups into indexes and performs them during the execution of the INSERT rather than as a post-query step. In theory, it would be possible to do something similar for some subset of UPSERTs, though we have nothing on our roadmap for this at the moment. You could loosely test whether this might help by adjusting your test to perform INSERTs (and making the changes to avoid inserting conflicts). If you see that this performs markedly better than UPSERTs, then it would support this theory.

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!

cockroachdb / cockroach

UPSERT with FK enabled #80319