cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.22k stars 3.82k forks source link

Creating View takes longer than expected on a small table [spending time in schema change] #118240

Open steven-hubbard opened 10 months ago

steven-hubbard commented 10 months ago

The following thread has more details as well as a tsdump, statement bundle, and debug.zip: Slack Channel Link

When creating a view on the primary key of a table, the view takes almost 2s to complete. The table only has 7 rows and it seems like it should be significantly faster with a where clause specifying the two leading values of the primary key in the view creation. The files above were generated in a three node single region environment run locally on a macbook. The timing is similar in a multi-region cluster.

Screenshot 2024-01-23 at 1 53 55 PM

Jira issue: CRDB-35636

rytaft commented 10 months ago

[triage meeting] Drew looked in the linked bundle and noticed that time was spent in kv. Still sending to SQL Foundations since they own view creation.

DrewKimball commented 10 months ago

The statement bundle is in this comment from the slack convo above: https://cockroachlabs.slack.com/archives/C01RX2G8LT1/p1706024952464799?thread_ts=1705432897.726909&cid=C01RX2G8LT1

rafiss commented 10 months ago

Here is a screenshot of the slow part of the trace: image

In this thread, the theory was that this is from Raft replication, which unfortunately is not well traced.

I will move this issue to the KV team, since to make progress, I think we need more tracing in that area.

pav-kv commented 10 months ago

The raft delay part is only 90ms in this trace, which doesn't seem too high without additional context. It involves at least 1 RTT and fsync, so can easily take that much.

The issue reports 2s latency, there seems to be no confirmation that most of this time is spent in replication. If it's 90ms out of 2s, then the slowness is somewhere else.

rafiss commented 10 months ago

Sorry for the confusion. There are two different executions being discussed in this ticket.

pav-kv commented 10 months ago

For tracing we have #104035.

Can we get a more representative trace for a higher latency request? I am not convinced by this trace that the 2s problem would be in raft, unless there are some connection or disk issues in this cluster. I'll check debug.zip to see if raft is healthy.

pav-kv commented 10 months ago

@rafiss I think there are still unclear moments which SQL could help clarify.

Statement bundle of this CREATE VIEW is consistently showing 200ms latency (we've just tried it another time with @steven-hubbard). However, the SQL command line consistently prints up to 2s for this statement. It's unclear where the other 1.8s are spent. How closely does the Time: 1.778s total on the screenshot correspond to time measurements in the bundle? Is there any large portion of computation that is missing from the bundle?

DrewKimball commented 10 months ago

Here's the line with verbose_timings enabled:

Time: 1.577294273s total (parse 50µs / plan 15.281ms / exec 27.173ms / schema change 1.431111s / other 103.474ms / network 205.273µs)
DrewKimball commented 10 months ago

It looks like we wait for the schema change to finish (make the descriptor public?), and that isn't recorded in the server-side statistics.