elastic / cloud-on-k8s

Elastic Cloud on Kubernetes
Other
61 stars 708 forks source link

Validate file realm user and role secret names #5626

Open noslowerdna opened 2 years ago

noslowerdna commented 2 years ago

Bug Report

Not entirely sure this qualifies as a bug, but it was somewhat unexpected and confusing behavior.

What did you do?

Deployed an Elasticsearch cluster using ECK with invalid (nonexistent) secret names for creating file realm users and roles specified in the Elasticsearch CRD manifest.

The full story here is that we are using a custom Helm chart and were attempting Spinnaker integration which by default added a version suffix (-v000) to the secret names unexpectedly. In a follow-up change we opted out of resource versioning for these secrets using the special annotation strategy.spinnaker.io/versioned = false. Then everything matched up between our secret manifests and the Elasticsearch manifest, and things worked.

What did you expect to see?

As this was a serious misconfiguration issue we expected that the Elasticsearch cluster would fail to start up, with an error in the logs (probably the eck-operator's?) indicating that the configured auth secret resources were not found.

What did you see instead? Under which circumstances?

Instead it seems that the problem was just silently ignored. The Elasticsearch cluster started up cleanly and reached a health status of green. However our custom users and roles were not setup, so we had only the elastic superuser available. We later searched the Kubernetes log files (eck-operator as well as master and data nodes) which we have forwarded into Splunk for that timeframe and did not find any references to the invalid secret names.

Environment

2.1.0

This happened in our development environment, so it was not a production outage situation.

On-prem, Konvoy/Kommander

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/arm64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.6", GitCommit:"d921bc6d1810da51177fbd0ed61dc811c5228097", GitTreeState:"clean", BuildDate:"2021-10-27T17:44:26Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}
kind: Elasticsearch
apiVersion: elasticsearch.k8s.elastic.co/v1
spec:
  auth:
    fileRealm:
    - secretName: my-es-users
    roles:
    - secretName: my-es-roles
kind: Secret
apiVersion: v1
metadata:
  name: my-es-users-v000
kind: Secret
apiVersion: v1
metadata:
  name: my-es-roles-v000

N/A

pebrc commented 2 years ago

When you (mis-)configure Elasticsearch with a file realm secret that does not exist you should see two things:

we expected that the Elasticsearch cluster would fail to start up

This is tricky. A user might configure file realm users for an already running cluster which we cannot stop anymore.

I guess what we could investigate is whether it would be more appropriate to add these warnings to the Elasticsearch status sub-resource instead of creating Kubernetes events.

noslowerdna commented 2 years ago

@pebrc Thanks for the quick response and information. Understand it's a tricky situation. I did just now confirm that we do see a log entry (should these perhaps be logged at warning or error level instead of info?),

{"log.level":"info","@timestamp":"2022-04-29T19:37:11.818Z","log.logger":"elasticsearch-user","message":"referenced secret not found","service.version":"2.1.0+02a8d7c7","service.type":"eck","ecs.version":"1.4.0","namespace":"elasticsearch","es_name":"my-cluster","secret_name":"my-cluster-es-roles-does-not-exist"}
{"log.level":"info","@timestamp":"2022-04-29T19:37:12.271Z","log.logger":"elasticsearch-user","message":"referenced secret not found","service.version":"2.1.0+02a8d7c7","service.type":"eck","ecs.version":"1.4.0","namespace":"elasticsearch","es_name":"my-cluster","secret_name":"my-cluster-es-users-does-not-exist"}

as well as the events,

elasticsearch                  10s         Warning   Unexpected             elasticsearch/my-cluster                                                referenced secret not found: my-cluster-es-roles-does-not-exist
elasticsearch                  9s          Warning   Unexpected             elasticsearch/my-cluster                                                referenced secret not found: my-cluster-es-users-does-not-exist

Apologies for not being thorough before opening an issue. I was searching for the logs in Splunk improperly, and didn't look for the Kubernetes events. It was beyond the event retention time anyway. I think we only retain them for an hour.

Maybe clarifying the expected behavior in the relevant ECK documentation for this hopefully rare "referenced secret not found" scenario is a simple way to alleviate any concerns here.