cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.91k stars 3.78k forks source link

sql: investigate session migration error due to multiple mutations of the same table not supported #121034

Open rafiss opened 6 months ago

rafiss commented 6 months ago

In our Splunk monitoring we saw this warning:

could not prepare statement during session migration (WITH ?results? AS (UPDATE ?worker?.?job? SET ?completedon? = now(), ?state? = ?'completed'?, ?output? = $2::JSONB WHERE (?id? IN (SELECT unnest($1::UUID[]))) AND (?state? = ?'active'?) RETURNING *), ?completion_jobs? AS (INSERT INTO ?worker?.?job?(?name?, ?data?, ?keepuntil?) SELECT ?'__state__completed__'? || ?name?, ?jsonb_build_object?(?'request'?, ?jsonb_build_object?(?'id'?, ?id?, ?'name'?, ?name?, ?'data'?, ?data?), ?'response'?, $2::JSONB, ?'state'?, ?state?, ?'retryCount'?, ?retrycount?, ?'createdOn'?, ?createdon?, ?'startedOn'?, ?startedon?, ?'completedOn'?, ?completedon?, ?'failed'?, CASE WHEN ?state? = ?'completed'? THEN ?false? ELSE ?true? END), ?keepuntil? + (?keepuntil? - ?startafter?) FROM ?results? WHERE (NOT (?name? LIKE ?'__state__completed__%'?)) AND ?on_complete? RETURNING *) SELECT ?count?(*) FROM ?results?): multiple mutations of the same table ‹×› are not supported unless they all use INSERT without ON CONFLICT; this is to prevent data corruption, see documentation of sql.multiple_modifications_of_table.enabled

https://cockroachlabsgcp.splunkcloud.com/en-US/app/search/search?earliest=-7d%40h&latest=now&q=search%20index%3Dcc-app-crdb*%20%22could%20not%20prepare%20statement%20during%20session%20migration%22%20AND%20(NOT%20%22does%20not%20exist%22)%20%7C%20timechart%20span%3D1h%20count%20by%20friendly_id&display.page.search.mode=verbose&dispatch.sample_ratio=1&display.general.type=statistics&display.page.search.tab=events&sid=1711387460.229600&workload_pool=standard_perf

If this keeps occurring, we should determine how to guard against this.

Jira issue: CRDB-37038

Epic CRDB-40419

rafiss commented 5 months ago

Happened once during the 4/15 to 4/22 window: https://cockroachlabsgcp.splunkcloud.com/en-US/app/search/search?earliest=-7d%40h&latest=now&q=search%20index%3Dcc-app-crdb*%20%22could%20not%20prepare%20statement%20during%20session%20migration%22%20AND%20(NOT%20%22does%20not%20exist%22)%20%7C%20timechart%20span%3D1h%20count%20by%20friendly_id&display.page.search.mode=verbose&dispatch.sample_ratio=1&display.general.type=statistics&display.page.search.tab=events&sid=1713813537.510814&workload_pool=

rafiss commented 1 month ago

Seen once more on 7/29: https://us5.datadoghq.com/logs?query=source%3Acockroachdb%20env%3Aproduction%20%22could%20not%20prepare%20statement%20during%20session%20migration%22%20-%22does%20not%20exist%22%20AND%20%40friendly_id%3Acrl-prod-8nj%20&agg_m=count&agg_m_source=base&agg_t=count&cols=host%2Cservice&fromUser=true&messageDisplay=inline&refresh_mode=paused&storage=flex_tier&stream_sort=time%2Cdesc&viz=stream&from_ts=1722268800000&to_ts=1722276000000&live=false