Open noslowerdna opened 2 years ago
When you (mis-)configure Elasticsearch with a file realm secret that does not exist you should see two things:
referenced secret not found
and the name and namespace of the affected resourcewe expected that the Elasticsearch cluster would fail to start up
This is tricky. A user might configure file realm users for an already running cluster which we cannot stop anymore.
I guess what we could investigate is whether it would be more appropriate to add these warnings to the Elasticsearch status sub-resource instead of creating Kubernetes events.
@pebrc Thanks for the quick response and information. Understand it's a tricky situation. I did just now confirm that we do see a log entry (should these perhaps be logged at warning
or error
level instead of info
?),
{"log.level":"info","@timestamp":"2022-04-29T19:37:11.818Z","log.logger":"elasticsearch-user","message":"referenced secret not found","service.version":"2.1.0+02a8d7c7","service.type":"eck","ecs.version":"1.4.0","namespace":"elasticsearch","es_name":"my-cluster","secret_name":"my-cluster-es-roles-does-not-exist"}
{"log.level":"info","@timestamp":"2022-04-29T19:37:12.271Z","log.logger":"elasticsearch-user","message":"referenced secret not found","service.version":"2.1.0+02a8d7c7","service.type":"eck","ecs.version":"1.4.0","namespace":"elasticsearch","es_name":"my-cluster","secret_name":"my-cluster-es-users-does-not-exist"}
as well as the events,
elasticsearch 10s Warning Unexpected elasticsearch/my-cluster referenced secret not found: my-cluster-es-roles-does-not-exist
elasticsearch 9s Warning Unexpected elasticsearch/my-cluster referenced secret not found: my-cluster-es-users-does-not-exist
Apologies for not being thorough before opening an issue. I was searching for the logs in Splunk improperly, and didn't look for the Kubernetes events. It was beyond the event retention time anyway. I think we only retain them for an hour.
Maybe clarifying the expected behavior in the relevant ECK documentation for this hopefully rare "referenced secret not found" scenario is a simple way to alleviate any concerns here.
Bug Report
Not entirely sure this qualifies as a bug, but it was somewhat unexpected and confusing behavior.
What did you do?
Deployed an Elasticsearch cluster using ECK with invalid (nonexistent) secret names for creating file realm users and roles specified in the
Elasticsearch
CRD manifest.The full story here is that we are using a custom Helm chart and were attempting Spinnaker integration which by default added a version suffix (
-v000
) to the secret names unexpectedly. In a follow-up change we opted out of resource versioning for these secrets using the special annotationstrategy.spinnaker.io/versioned = false
. Then everything matched up between our secret manifests and the Elasticsearch manifest, and things worked.What did you expect to see?
As this was a serious misconfiguration issue we expected that the Elasticsearch cluster would fail to start up, with an error in the logs (probably the eck-operator's?) indicating that the configured
auth
secret resources were not found.What did you see instead? Under which circumstances?
Instead it seems that the problem was just silently ignored. The Elasticsearch cluster started up cleanly and reached a health status of green. However our custom users and roles were not setup, so we had only the
elastic
superuser available. We later searched the Kubernetes log files (eck-operator as well as master and data nodes) which we have forwarded into Splunk for that timeframe and did not find any references to the invalid secret names.Environment
2.1.0
This happened in our development environment, so it was not a production outage situation.
On-prem, Konvoy/Kommander
N/A