cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.13k stars 3.81k forks source link

sql,cli/sql: provide human-directed feedback on quality of SQL queries #41352

Open knz opened 5 years ago

knz commented 5 years ago

We want CockroachDB to tell human users when we know their SQL is poor, so they get a chance to learn and fix it without coming to us (support).

Context

As discussed between @kannanlakshmi @tim-o @awoods187 @bdarnell:

@kannanlakshmi [we were] chatting about [the] “incident” last night and their latency issues were related to mostly bad schema design (no indexes, a bunch of full table scans for instance) and we are wondering is it that there are a set of schemas that just perform badly on CRDB but do well on others like MySQL or at a diff isolation level and our users are taken aback? Or is it that devs (generalizing here) aren’t great at thinking about databases ahead of time?

@bdarnell everyone's first experience with a sql db is like this. if they're coming from nosql (or if they're new to backend development in general, or if their prior sql experience was at small scale where it doesn't really matter), they'll hit this obstacle repeatedly until they learn to avoid it sql is especially bad for this because it lets you write queries that work just fine at small scale but are terrible at large scale, and you need either a lot of understanding of sql or to do a lot of testing at scale to avoid this that's why i really want something like #13969 to cause an early failure of queries that are likely to be problematic at scale

Constraints

Idea bag

FWIW, I (@knz) have a slight preference for option 4 above with a SQL session var to activate.

Jira issue: CRDB-5456

knz commented 5 years ago

@piyush-singh I guess that's your product area?

petermattis commented 5 years ago

See also https://github.com/cockroachdb/cockroach/issues/20060. I wonder if the optimizer's cost estimation is robust enough to provide insights here.

bdarnell commented 5 years ago

although the pgwire protocol supports a "warning" (notice) message that can carry informatio in-between query errors and results, this is neither currently supported by CockroachDB nor it is supported by many ORMs or client drivers.

Also, where clients do support warnings, it's often by turning them into errors, which means that whenever a legitimate query triggers a warning we'd probably need to provide a warning-free way to do the same thing. I'm not sure whether warnings are an appropriate mechanism for alerting the user to expensive table scans.

a high priority goal is to ensure that interactive experimentation in cockroach sql or cockroach demo prominently displays these messages, as this is an environment that is specifically provided for the purpose of teaching/learning.

This can be a start, but I think longer-term we'd rather get this into the web UI so that you can see the analysis of queries your application has actually run.

the logic must be disabled for "internal" queries issued by CockroachDB itself (e.g. pg_table_is_visible, job updates, user auth checks, event log, etc)

We also want to make sure that these queries are well-indexed, so I think it's reasonable to require that these queries are treated the same as user queries (and treat warnings-as-errors).

We could also look at what other databases do here. MySQL has a SHOW WARNINGS command, for example (which dumps a per-session warning buffer. The warnings are also streamed back as part of the network protocol).

knz commented 5 years ago

We also want to make sure that these queries are well-indexed,

yes i agree

so I think it's reasonable to require that these queries are treated the same as user queries (and treat warnings-as-errors).

no this does not follow.

1) The mechanisms that will analyze query quality are likely to incur a overhead. There is no way to control this overhead on internal queries otherwise, so the analysis should be disabled for those queries.

2) producing user-facing warnings for those queries would be highly confusing for UX: users have no control over cockroachdb-issued SQL, so the warnings would not be actionable. Ergo, whatever warning we produce for them should be invisible. So we have no reason to produce them in the first place.

awoods187 commented 5 years ago

I also filed https://github.com/cockroachdb/cockroach/issues/41341 and https://github.com/cockroachdb/cockroach/issues/41340 last week in this same space

rmloveland commented 4 years ago

@knz I want something like this so badly. Just wrote the following in https://github.com/cockroachdb/cockroach/issues/41259#issuecomment-562215431:

A variant of this idea that I would love to have when writing SQL is: - Create a session setting like "sql.enable_developer_style_warnings" or so (similar to gcc -Wall in spirit) - When in this mode, return CRDB-specific "Class 01 - warning" error codes to the client that warns when queries that are likely to affect performance are sent - e.g., 01009 - no covering index or 0100A - result set greater than 500 rows or 0100F - full table scan (the first two cases are specifically mentioned in Andy K's "Optimizing OLTP queries" doc)
knz commented 4 years ago

Oh Rich that is actually quite the insight: use regular SQL errors as warnings, when a special "study mode" is enabled.

I like it!

knz commented 4 years ago

It would also make this easy to integrate with arbitrary pg GUIs.

knz commented 4 years ago

This becomes possible now that crdb supports the Notice protocol.

github-actions[bot] commented 1 year ago

We have marked this issue as stale because it has been inactive for 18 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to CockroachDB!