cockroachdb / cockroach-operator

k8s operator for CRDB
Apache License 2.0
277 stars 90 forks source link

operator incorrectly uses `--logtostderr` + can't use new --log flag with Operator #613

Open taroface opened 3 years ago

taroface commented 3 years ago

I attempted to specify the new --log flag in the custom resource:

  additionalArgs:
    - "--log='sinks: {file-groups: {ops: {channels: [OPS, HEALTH]}}}'"

This resulted in the pods crashing:

NAME                                 READY   STATUS             RESTARTS   AGE
cockroach-operator-d4df75996-pdpmd   1/1     Running            0          135m
cockroachdb-0                        0/1     CrashLoopBackOff   11         34m
cockroachdb-1                        0/1     CrashLoopBackOff   11         34m
cockroachdb-2                        0/1     CrashLoopBackOff   11         34m

The kubectl logs show:

++ expr 8192 / 4
++ expr 8192 / 4
+ exec /cockroach/cockroach.sh start --join=cockroachdb-0.cockroachdb.default:26258,cockroachdb-1.cockroachdb.default:26258,cockroachdb-2.cockroachdb.default:26258 --advertise-host=cockroachdb-0.cockroachdb.default --logtostderr=INFO --certs-dir=/cockroach/cockroach-certs/ --http-port=8080 --sql-addr=:26257 --listen-addr=:26258 --cache 2048MiB --max-sql-memory 2048MiB '--log=sinks: {file-groups: {ops: {channels: [OPS, HEALTH]}}}'
Flag --logtostderr has been deprecated, use --log instead to specify 'sinks: {stderr: {filter: ...}}'.
E210621 20:27:35.192687 1 1@cli/error.go:399  [-] 1  ERROR: --log is incompatible with legacy discrete logging flags
ERROR: --log is incompatible with legacy discrete logging flags

Since --logtostderr is deprecated in v21.1 (and according to @knz we shouldn't be logging to stderr anyway), perhaps this line in the code should be removed. I believe the logging behavior would then default to this configuration.

If we do need to log to stderr on Kubernetes, I believe the equivalent v21.1 flag is:

--log="sinks: {stderr: {channels: all, filter: INFO}}"

It looks like --logtostderr is also used in a couple .golden files, but I'm not sure how relevant those are.

knz commented 3 years ago

The flag --logtostderr should not be used in our reference kubernetes configs.

In fact, we should never recommend logging to stderr. So even the proposed --log config would not be desirable. The idea to force logging to stderr is a system design mistake.

knz commented 3 years ago

I think the proper recommendation to move forward is as follows:

chrislovecnm commented 3 years ago

Is this a version related issue? Are we going to have problems with different supported versions of crdb?

chrislovecnm commented 3 years ago

https://github.com/cockroachdb/cockroach-operator/blob/8e7c5b6b961e5502692aa7a29289b19d88196696/pkg/resource/statefulset.go#L340

This where we are setting logging.

knz commented 3 years ago

The fact --logtostderr is being used at all is an issue that exists for all crdb versions. This should not have been used from the start.

Additionally, starting in v21.1, if the user wants to customize their startup flags with a custom --log, that is currently made impossible due to the orchestration currently using --logtostderr.

taroface commented 3 years ago

cc @udnay for visibility

chrislovecnm commented 3 years ago

I need some requirements.

knz commented 3 years ago

what do we have for default?

The default config is to write logging output to files cockroach-xxx.log in the logs sub-directory of the data directory. This is good and exactly what we need for troubleshooting.

There is no operational nor troubleshooting value for production cluster provided by stderr output via --logtostderr.

Therefore, the requirement is to remove this flag from the config.

we need an api value to do specific configurations

The --log flag accepts a YAML payload (or use --log-config-file to point to a file containing a YAML payload). An operator should be able to provide a custom YAML payload in such a way that the flag is passed to crdb only if the custom YAML payload was provided.

chrislovecnm commented 3 years ago

Yah the default configuration is funky for k8s. We always need logging to console. So we need output to console and that is why I think my predecessor used that flag.

We need to use --log, but putting yaml in a yaml file might be fun.... 🤔 - have not done that before.

This brings up a good point, the users can break logging pretty easy. CRDB is set up to log as a binary not running inside of a container; log to a file. Do we want to check that a user is not logging to a file or is at least always logging to console and possibly another place?

chrislovecnm commented 3 years ago

@piyush-singh can you please document business requirements. And keep in mind we always need logging to console.

knz commented 3 years ago

at least always logging to console and possibly another place?

I'm saying not log to console by default.

If something gets logged to the console, make it only the OPS channel.

chrislovecnm commented 3 years ago

@knz yah I got you, but how does kubectl log work?

knz commented 3 years ago

the mistake you're making is assuming we want to see crdb logs in the output of kubectl log. That is incorrect. The useful data we need to extract for troubleshooting is not present in the crdb logs.

It's good to see the default stderr output (which is just a server startup message) in the output of kubectl log. we might have some interest in seeing the output of the OPS logging channel (new in 21.1) via kubectl log but I don't know that we have evidence for that (yet).

Whatever additional detailed logging produced by crdb should not be available via kubectl log. The logging is largely structured and inadequate for channeling through kube's simplistic logging output. We should instead require an operator to run cockroach debug zip or some other related command to extract operational details.

udnay commented 3 years ago

@knz how does cockroach debug zip work? Will it only zip up logs in a specific directory? On k8s the norm is to log app data to stdout as logs in a pod are ephemeral unless the log dir is mounted directly from the host.

Without exec'ing into a pod, how would the operator see the logs if they bypass k8s logging facility?

chrislovecnm commented 3 years ago

@udnay yah we would need to do a host mount or pvc to make the logs non ephemeral if we want to do some complex logging.

My thoughts are to keep it simple meeting the k8s logging standard. To your point k8s users are going to expect logs going to stdout, or really we break how logging works in k8s. All of the hosted k8s solutions have log aggregation that rely on console logging.

Yes we may have some edge cases for disk logging, but we need a simple solution first.

I think the “debug zip” is another problem that we need to tackle as well, we may have an issue open already. IMO let’s use this issue for talking about logging and open a different issue for debug.

knz commented 3 years ago

ok so this is still a "bug" issue, not "enhancement".

I think we've been talking past each other and somehow I have not been successful at conveying how serious the situation is.

I'm going to rephrase this more clearly.

As of today, the current operator code wrt logging has been misdesigned. What is being done was incorrect from the start, and the fact nobody noticed does not detract from the fact it's a problem and needs to be corrected. (See this definition)

What is wrong exactly?

  1. the flag --logtostderr exposes sensitive information to the process' stderr in a way that does not provide redactability. Consequently, plugging that output to the kubectl log command can defeat IT policies about confidentiality.

    This is a serious issue that multiple customers have complained about in the 19.x cycle and we solved in 20.1/20.2 by creating product features and generally documenting that stderr logging should not be used, but somehow this operator project did not get the memo.

  2. the flag --logtostderr creates output that cannot be reliably parsed. Presumably, customers who integrate with orchestration wish the ability to process logging data automatically. The output produced by crdb on its stderr stream fails at this. Even though it looks like something parseable on the surface, it is a documented fact that under stress and in problem situations, the log entries become malformed. This is not fixable via stderr logging, but is fixable via file or network logging.

    Right now the default config used by the k8s operator code is mistakenly giving users the impression they can rely on the logging output by kubectl log and automate log monitoring etc based on that data. This is a lie to our users/customers.

  3. you and others who are working on the operator project are working under the mistaken impression that "logging" is somehow distinct from "database state". I see this for example in your suggestion that "if we store log files we'd need ephemeral storage" or the question "how would the operator look at the logs without execing into the node". That is something we need to work on culturally.

    In a nutshell, crdb logging data has the same quality as general stored SQL data.

    We don't think about "how would the operator access the SQL data without execing into the node", right? To access SQL data, we open a SQL client.

    The same applies to crdb logs. To access the log data, we use the log retrieval API (over HTTP).

    Generally, the log files should be treated with the same level of care for persistence and confidentiality as the data files: they are absolutely crucial to preserve durably, just like the crdb store data directories, and that's specificially the reason why crdb choses a logs sub-directory of the store directory by default. If the customer cares about at-rest confidentiality, the log files should be encrypted together with the data files.

    Right now, we are lucky that the flag --logtostderr merely copies log data to stderr but still preserves crdb's default behavior, which is to also create discrete files in the logs directory. The statefulset code, either intendedly or not, was not implemented to explicitly disable writing logging data to files in the store directory, and that's a good thing!

  4. starting in v21.2, the --logtostderr flag will be removed. So the current startup code will break.

  5. the best practice with distributed systems with crdb is to aggregate the logging output from all the nodes into a network logging collector. Arguably, it is the k8s operator's role to spawn a logging collector, such as Fluentd, alongside the crdb nodes inside the k8s cluster and "plug" the services together by making the logging config inside crdb point to the logging collector. This way the crdb nodes would send their data reliably, in a parseable format, to Fluentd, where it can be stored reliably and confidentially.

    This organization for log collection is what both our on-prem customers and our CC teams have requested and we've built in 21.1. It is legitimately the best practice for SREs and our k8s operator should definitely exemplify this best practice.

    I am surprised this hasn't been discussed yet.

So these are the problems and now let's review the suitable actions:

Problem Current in v20.x / v21.1 Future severity Action to take
(1) confidentiality leaks yes high (and likely to blow up when customers realize we're not delivering the level of redactability via k8s that they're expecting from our 20.2/21.1 docs) don't use stderr logging, keep the redactable logging streams either on disk or use network logging
(2) broken logging format yes high because this will explode in our face when a customer notices that the log monitoring they've implemented was unable to catch the symptoms of a severe outage. This will persist in v21.2 because this is unfixable on our side (bug in the Go runtime system) don't use stderr logging, use either files using the crdb-v2 format or JSON with a network collector
(3) mistaken understanding about the nature of logs yes moderate but our Tech Support and L2 engineers will dislike the lack of log separation when they will be asked to troubleshoot k8s deployments don't use stderr logging, and acknowledge/document log access over HTTP
(4) flag is deprecated no high when the k8s operator simply stops working with v21.2 don't use the --logtostderr flag
(5) lack of logging cohesion yes moderate, will become higher when customers deploy more complex setups, including multi-region setups don't use stderr logging, and set up an initiative to extend the k8s operator to deploy network log collectors alongside crdb nodes

Let me know if you have further questions.

udnay commented 3 years ago

Ok, so --logtostderr should be removed, @piyush-singh will provide the logging setting that are backwards compatible and provide redaction. This will use CRDBs best practices for log redaction and be backwards compatible with CRDB versions that the operator needs to support.

My strong feeing is that we need to use the premise of least surprise and that a new user should be able to see the application logs within a pod using k logs. As a feature enhancement we will add configurability to the CR API that will let a user fully configure their logging. For GA this is too late to add.

knz commented 3 years ago

Some minimum amount of safe and useful logging can be provided in 21.1 by copying the events from the OPS logging channel to stderr.

piyush-singh commented 3 years ago

Okay, so I took some time to explore the solution space here and I have a few updates.

For version 21.1 and beyond, we can capture the appropriate log messages using the following log config on startup:

cockroach start --log='sinks: {stderr: {channels: OPS, redact: true}}' <other flags>

This will only pass redacted messages from the OPS channel to the stderr sink as Raphael pointed out above.

For versions 20.1 and 20.2 that we are supporting on the Operator the story is a bit more complicated. I spoke with Raphael, Josh on our SRE team for Cloud, and Daniel from the SE team about expectations for K8s logs. On CockroachCloud, we actually do not log anything out to stderr (outside of the initial logs emitted on node startup) in K8s and instead the SREs exec into pods to check logs. Josh noted that this ran against his expectations initially but this has become standard practice and the team is now used to doing this. Of course, this still runs the risk that if the node is crashing, you won't be able to access logs that are persisted. Josh noted that having the full unredacted logs available somewhere is important as an operator though.

Daniel from our field team had a similar concern. When he has stood up K8s instances for customers, he usually sets up a sidecar pod to separately persist logs or streams them out to a log collection service. This lines up with our long term vision, but the complexity of setting up a sidecar for logging is too much for the GA timeline.

Raphael suggested that we may be able to backport some log redaction functionality into the 20.x releases. @knz would you please share how much of an effort you would expect that to be? I'm very hesitant to go with this route for the GA release for the operator in ~2 weeks.

In sum, we have 3 options here:

  1. Keep the 20.x releases as they are with all logs going to stderr. There is a security risk inherent in doing this as those logs will contain sensitive information like PKs from the database (and much more).
  2. Try to modify stderr logging behavior for 20.x release via backport.
  3. Remove stderr logging, keeping only what is printed out on node startup. End users will need to exec into pods or use the HTTP logging endpoints to access logs.

Given our time constraints and that the SRE team has been using option 3 in prod in CockroachCloud, I would lean towards option 3. Versions 20.x will only be supported for another year, and depending on our TDB support policy for the operator, we may even sunset those versions earlier. After that, all versions will be able to support the above setting available from v21.1 and beyond.

knz commented 3 years ago

@knz would you please share how much of an effort you would expect that to be? I'm very hesitant to go with this route for the GA release for the operator in ~2 weeks.

The change I have in mind is specifically to truncate log entries to just 1 lines on stderr when a flag is specified, and do pre-redaction. This would be compatible with 20.1 and 20.2, although the pre-redaction would not work on 20.1 (redactability was a 20.2 feature) and so we wouldn't be solving all the problems that way.

chrislovecnm commented 3 years ago

@piyush-singh what are we doing for the release? This is a long issue, and sorry if it is buried in the issue somewhere

piyush-singh commented 3 years ago

Because we have 1 week until the release, I don't think we should pick up this work now. Keith pointed out that our Helm Chart and statefulset configs also have the --logtostderr flag set - I don't think we should try to solve this issue in the next week ahead of the release if it has been around for multiple years. I went ahead and filed #622 as a follow on enhancement after the release is out the door. I can also work with Raphael to address the v20.1 and v20.2 issues which should cover most of our bases.

PiotrKozimor commented 2 years ago

Just to give another perspective - on GKE with Cloud Logging integration all logs from CRDB appear as errors. image