cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.9k stars 3.78k forks source link

sql: add query denylist facility #51643

Open jordanlewis opened 4 years ago

jordanlewis commented 4 years ago

It can be desirable to deny a query fingerprint from being run on the database, because of bugs in CockroachDB, difficulty in tracking down expensive queries in a workload, or other reasons. There's currently no way to do this besides turning off the source of the query.

We should add a facility to deny a particular query from being run at all. I'm imagining a system table that has a query fingerprint as a key. We'd need some way to cache this information, since it would not be okay to check the denylist on receipt of every query.

Conceivably, a similar mapping could be used for query plan management, so whatever we decide on representation wise, we should make sure that we can extend the "settings" for a query fingerprint in new ways.

Another important feature, mentioned by @joshimhoff, is the inclusion of a "dry run" flag. Blocking a query fingerprint is very disruptive and dangerous. It's important to be able to test the denial of a fingerprint before activating it. Turning on the "dry run" flag for a query fingerprint should cause the database to log when it receives a query that would be blocked by that rule.

Probably, a feature like this should also get a time series metric to help operators detect when it is working in a way that is not just looking at the logs.


Jira issue: CRDB-4022

jordanlewis commented 4 years ago

@solongordon, I've tentatively assigned this to the features team, but it's certainly up for debate.

joshimhoff commented 4 years ago

Thanks for opening this!

Probably, a feature like this should also get a time series metric to help operators detect when it is working in a way that is not just looking at the logs.

Ya. Could also have an admin UI page (not mutually exclusive with the time-series metric).

otan commented 4 years ago

Calling out adapting something like https://github.com/dropbox/load_management as a solution - this library does rate limiting for query patterns in general as opposed to a straight up deny list, if that's something we ever decide to foray into...

joshimhoff commented 4 years ago

SRE really needs this!

I found a duplicate issue from January: https://github.com/cockroachdb/cockroach/issues/44597. That was an action item from this incident: https://cockroachlabs.atlassian.net/wiki/spaces/MC/pages/273449722/2020-01-22+import+job+of+death+crash-loops+cluster. Should have poked it more!

Two Fridays ago, we had two production issues that could have been mitigated much quicker and with much less effort (from SRE & DB eng), if a denylist was available:

  1. Customer uses an experimental feature (tmp tables? related to tmp tables...) in such a way that causes all system tables to become unavailable: https://github.com/cockroachdb/cockroach/pull/52148
  2. OOMs caused by an inverted index creation job (probably caused by not having memory accounting on the backfiller component).

In both cases, the customer does something which triggers the bad behavior. With 1, SRE could make system tables available again by rolling the nodes, but if the customer uses the problematic feature again, same issue.

This isn't sustainable. There are a lot of cloud clusters already. If the cause of some bad behavior is a known stream of queries, we need to be able to block that stream from running, without talking to a customer. Note that between 1 and 2, SRE got paged 5-6 different times over a three day period, including once at night, even though both issues were understood.

In general, SRE needs mitigations that don't involve code changes!

Relatedly, this makes the coding fix lower priority to release, which is good for DB eng and release eng (release eng made a cherypick release for SRE on Friday).

jordanlewis commented 4 years ago

While I agree we should really have this feature to protect SRE time and customer clusters, I'm not so sure I agree that your two examples are the best examples. In my mind, the use of the denylist is to help customers who have workloads running that are pushing things over against their wishes, but they can't isolate which queries are doing it or stop just part of the workload without bringing everything down.

In this case, on the other hand, a user is trying and failing to run something over and over. Let's say we used the denylist to deny these queries from being run. What error would the user get after we added the denylist? Would the user actually be happier in this case?

joshimhoff commented 4 years ago

In this case, on the other hand, a user is trying and failing to run something over and over. Let's say we used the denylist to deny these queries from being run. What error would the user get after we added the denylist? Would the user actually be happier in this case?

Here are the two examples:

  1. Customer uses an experimental feature (tmp tables? related to tmp tables...) in such a way that causes all system tables to become unavailable: #52148
  2. OOMs caused by an inverted index creation job (probably caused by not having memory accounting on the backfiller component).

Re: 2, for a long while, the index creation job wasn't succeeding. It is strictly better for the user to have a non-OOMing cluster & failing index creation job than to have an OOMing cluster & a failing index creation job. Also, creating an inverted index is not generally in the critical path of a customer's application. Some queries are more important to serve than others. SRE can't know exactly what is best for a particular customer, but there are rules of thumbs: It's better to fail a very small number of queries (not kinds of queries but absolute # of queries) than most / all queries. It's better to fail DDL (certain statements especially, e.g. CC's SLA on backup success isn't tight yet) than DML.

Re: 1, my understanding is that the feature was experimental. I def think we should turn off an experimental feature to protect the overall health of a cluster.

It is certainly hard to generalize. But blacklisting queries is an option that the SRE oncall needs. Even more so once we run mulit-tenant CRDB, since then there are other users to protect.

In this case, on the other hand, a user is trying and failing to run something over and over.

Ya... The user isn't necessarily aware of the effects of their actions on the reliability of their own CC cluster & app though...

Part of the deal with a managed service is you give up some control to the group running the service.

What error would the user get after we added the denylist?

Something that tells the user that the queries have been blacklisted to protect overall system health & to contact support if needed.

otan commented 4 years ago

cc @awoods187 @vy-ton

Looking at getting Angela started on this over the upcoming few weeks. I am proposing we scope this issue as a pure "deny-list" functionality, as opposed to a "rate limit" solution (which requires substantially more work and much more configuration settings and knobs).

I've started us off with strawman implementation ideas. When this has settled, we'll turn this into an RFC. Let me know if you prefer a google drive form!

EDIT: use this doc for collaboration


Strawman Denylist Parameters

Option 0: Exact Query Match denial

As it sounds, just take the query the user gives in as the denylist. Does not work well in the case the queries may look different, e.g. on the WHERE clause. For that, it's probably a non-starter.

Option 1: Query Fingerprint

We can allow query finger prints, similar to the ones shown on the list statements page. They look like this:

SELECT _, _, _ FROM _ WHERE _ = _ ORDER BY _

If we have a matching finger print, we deny the request. This has the side effect that it may block legit traffic which would otherwise be allowed, since we anonymise some of these fields.

Option 2: Regex

Instead, allow specification of regexp to act as the denylist, e.g.

SELECT a, b, c FROM d WHERE e = .* ORDER BY f

This can get annoying as some elements may have to be escaped, e.g. SELECT ((a)) FROM b would be annoying SELECT \(\(a\)\) FROM b. However, overall it gives more power to the user.

Option 3: All of the above

Offer all of the above. Makes the configuration much more complicated, but covers all cases.

Strawman Denylist Management

Here are a couple of strawman ideas for managing the denylist:

Option A: File managed blacklist

Store a denylist locally on every machine as a config file, e.g. query_denylist.yaml. A db admin would have to scp this file onto every machine if this requires an update, and cockroachdb would be watching for file changes to this file. Easy solution to implement, hard to use by the customer as it requires machine access.

If a user wants to "dry-run" these changes, they could upload a query_denylist.staging.yaml file to all machines, which would log but not actually execute the query.

File Format?

YAML? JSON? Custom format? Which one is best?

Option B: CLUSTER SETTING

Use a cluster setting, e.g. SET CLUSTER SETTING sql_denylist = 'queryA, queryB. queryC', to be used as the denylist. This propagates the current way it does now and is easy enough to implement.

Unfortunately, the single string being used as a denylist can be quite unmanageable. It most likely requires some sort of file format. However, we can do something like SET CLUSTER SETTING sql_denylist IMPORT FROM <stdin/nodelocal/etc>, which reads in the file as appropriate and sends that off as a string. This likely would also be useful for other things in future.

Dry runs can be made by a similar cluster setting variable, e.g. SET CLUSTER SETTING sql_staging_denylist.

Bit more work, still has the file management issue.

Option C: System table

Use a system table to store denylist queries. Create a table like so:

CREATE TABLE system.denylist (
   id uuid PRIMARY KEY DEFAULT unique_rowid(),
   param string,
   type enum('regex', 'fingerprint', 'raw'),
   dryrun bool,
)

Each node would cache this denylist, refreshing this table periodically (e.g. every 30 seconds). Alternatively, an update is sent via gossip that forces these nodes to update when this table is updated (the latter sounds like something should have been done about this before). The updating can cause extra contention, so some sort of gossip-sync mechanism on top of that in exchange for a lower refresh interval sounds like a better idea.

We can create builtins which insert into this table, to make it easy to manage, e.g. SELECT insert_into_denylist('fingerprint', 'SELECT _ FROM _', false) (as opposed to INSERT INTO system.denylist VALUES (...).

I'm not experienced with gossip / notify protocols to know whether the update protocol is set, but if it is, sounds like a good option.

Option D: env variable/cli flag

Similar to A, except a user would require to set an ENV var / CLI flag. This would require restarting every node to manage.

Metrics

I assume we'll want all these metrics:

Messaging

The output message to the user should be something useful, e.g.

pgerrors.Newf(pgcode.ConfigurationLimitExceeded, "this query was added to a denylist by the database administrator")
joshimhoff commented 4 years ago

THIS IS SO EXCITING!!!!

joshimhoff commented 4 years ago

Some thoughts:

  1. I think it makes sense to build option 1 or 2? Option 0 doesn't go far enough, does it?

  2. I think a dry run mode is a requirement. It's just too dangerous setting the deny list without a dry run mode. Agreed?

  3. I think another really important requirement is to be able to set this in emergencies, when a cluster may not be available.

More on 3. Here are the three options:

a. File. b. Cluster setting. c. System table.

I dislike b & c since it may be hard to make a change via those mechanisms if the production incident is very severe.

a sounds good at first, but in the k8s environment (in the CC environment), it's hard to get access to the persistent disk if nodes is crashing in certain ways, e.g. if the CRDB nodes are OOMing. In this case, the containers don't stay up for long. There are ways around this, but they take time and they are painful.

The simplest interface for operators in the k8s environment is a CLI flag or env variable, IMHO. These can be set regardless of what is happening with the cluster, and k8s has good support for both.

otan commented 4 years ago

I think it makes sense to build option 1 or 2? Option 0 doesn't go far enough, does it?

Yeah. But it's the ultimate strawman :D

I think a dry run mode is a requirement. It's just too dangerous setting the deny list without a dry run mode. Agreed?

Agreed!

I think another really important requirement is to be able to set this in emergencies, when a cluster may not be available.

Heard.

it's hard to get access to the persistent disk if nodes is crashing in certain ways, e.g. if the CRDB nodes are OOMing

OOM would not stop you placing a file onto disk, would it?

The simplest interface for operators in the k8s environment is a CLI flag or env variable, IMHO. These can be set regardless of what is happening with the cluster, and k8s has good support for both.

Would that require a restart for every node in the cluster? Also, wouldn't managing a larger denylist be pretty difficult using env variables?

petermattis commented 4 years ago

My personal preference would be for this configuration to be done via a cluster setting, similar to the hba.conf setting. See server.host_based_authentication.configuration. It is true that we'd want to have safeguards in place against an admin shooting themselves in the foot with the denylist. Perhaps the config could have two modes for each regex/pattern: deny and log. That would allow the admin to test the pattern with log before enabling it with deny.

We probably also want to prevent denylisting certain queries, or queries from certain users. For example, denying SET CLUSTER SETTING .* could be problematic as then you'd never be able to adjust the denylist again. This wouldn't be a problem if the denylist did not apply to users who are able to set cluster settings.

An argument against using a cluster setting is that a query of death may be prohibiting the cluster from successfully executing such a query.

joshimhoff commented 4 years ago

We probably also want to prevent denylisting certain queries, or queries from certain users. For example, denying SET CLUSTER SETTING .* could be problematic as then you'd never be able to adjust the denylist again. This wouldn't be a problem if the denylist did not apply to users who are able to set cluster settings.

Good point!

An argument against using a cluster setting is that a query of death may be prohibiting the cluster from successfully executing such a query.

Ya this is my issue with using a cluster setting. I think if we use a cluster setting there needs to be another path to setting it that is available even if the cluster is not. Really, ideally, this applies to all cluster settings, when I think about it.

OOM would not stop you placing a file onto disk, would it?

On k8s it is hard to write to the disk if a cluster is OOMing. If OOMing, the containers don't stay up long enough for an operator to exec in (https://kubernetes.io/docs/tasks/debug-application-cluster/get-shell-running-container/) and write the file. There are ways around this (e.g. stop the CRDB pod from executing & mount the disk via another pod) but they are fairly high toil in the k8s environment. Setting a CLI flag is easy OTOH with k8s.

Would that require a restart for every node in the cluster?

Yes but I don't think this is a problem really. CRDB can handle rolling restarts. What do you dislike about this?

Also, wouldn't managing a larger denylist be pretty difficult using env variables?

Ya a bit tedious. A file is nice if there is some way for an operator to easily set it in the k8s environment even when nodes are crashing. Maybe there is a way. Will need to think more with my SRE friends. Possibly a k8s config map that is mounted somewhere on the filesystem (that is, not the store dir). So to y'all, just a file, but not in the store directory.

https://kubernetes.io/docs/concepts/configuration/configmap/

chrisseto commented 4 years ago

Swooping in with some scope creep, would it be possible to include a message with denied queries?

Hypothetically, if the host of the CRDB instance doesn't have a direct line of communication to the end user is may be helpful to provide some context or further instructions when a query is rejected.

IE: "Hey this is a bad query try ___ instead" Or "This triggers a known bug. See [github link] for work arounds and details."

EDIT: swapped <> with []

otan commented 4 years ago

IE: "Hey this is a bad query try ___ instead" Or "This triggers a known bug. See for work arounds and details."

Yep, that (should be) easy enough. I've added it to the comment.

otan commented 4 years ago

Ya a bit tedious. A file is nice if there is some way for an operator to easily set it in the k8s environment even when nodes are crashing. Maybe there is a way. Will need to think more with my SRE friends. Possibly a k8s config map that is mounted somewhere on the filesystem (that is, not the store dir). So to y'all, just a file, but not in the store directory.

Let me know what happens here. it sounds like for foolproofness, the file / client env var approach is the minimum requirement.

I think it's possible we may want files and cluster settings, for ease of use, but file seems to be the most foolproof way. Would be nice to get input from other users as well.

JuanLeon1 commented 4 years ago

I like the config map because it sticks out in github PRRs like a sore thumb after you have made a cluster into a pet.
Also, I think k8s can implement the config map as a file, which, from teh DB side, reduces to the file case, and does not require a node restart. Eventually, config maps can also be read by an "admin" service in the pod that could do some of the other things we need to do remotely (network-layer-throttling-by-principal, e.g), but that is ill-defined and further in the future.

DuskEagle commented 4 years ago

I think reading from a file, the path of which can be specified via a flag to cockroach start, provides the best operator experience. E.g. cockroach start --sql-denylist=/path/to/denylist. Like Juan said, this also works well in a K8s environment, where you can manage the denylist using a ConfigMap which is projected to a file within the CRDB pod. If CRDB can watch for changes to this file and reload the denylist, then no restart is necessary.

I think it's possible we may want files and cluster settings, for ease of use, but file seems to be the most foolproof way. Would be nice to get input from other users as well.

I can see this as well. Using files is useful for operators that have filesystem-level access to the cluster, and is also necessary when attempting to configure a denylist when the cluster is unavailable. For users with only SQL-level access, a cluster setting would be useful. But like @petermattis mentioned, we'd have to be careful about locking users out of their cluster by denylisting SET CLUSTER SETTING queries.

'm not convinced about the idea that the denylist should not apply to users who are able to set cluster settings - many potentially destabilizing queries, such as large bulk-io operations, are already restricted to admin users. As an operator, if you're trying to denylist a particular query pattern, you'd have to check if the query is originating from an admin user or not to know if the denylist entry would have any effect.

petermattis commented 4 years ago

I think reading from a file, the path of which can be specified via a flag to cockroach start, provides the best operator experience. E.g. cockroach start --sql-denylist=/path/to/denylist.

Will --sql-denylist apply only to the current node? Is there any concern about different nodes having different denylists?

If CRDB can watch for changes to this file and reload the denylist, then no restart is necessary.

Watching for changes to a file isn't something CRDB typically does (I'm not sure if there are existing examples of such behavior). We could add a facility to reload the file via a SQL builtin or similar.

otan commented 4 years ago

moving this to a doc because there's a lot of threads to pull here.

angelapwen commented 3 years ago

PR for the RFC is up at #55778, borrowing heavily from this discussion and the brainstorming doc!

rafiss commented 3 years ago

Leaving this in the long-term backlog since the project focus was changed to making feature-specific cluster settings. If we need a real denylist later, we can revisit.