Open flash1293 opened 1 month ago
Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)
This log error does not say anything about the reason it crashed. We would need to reproduce the environment and check the diagnostics and the agent pod consumption
@MichaelKatsoulis
I started a local minikube cluster, then followed the onboarding flow from a fresh Observability serverless project on prod.
I replicated the scenario:
I followed the instruction of monitoring Kubernetes as if I was a first time user.
I noticed the following:
The kustomize command attempts to override the Elasticsearch host by setting
-e "s/%ES_HOST%/https:\/\/katsoulis-serverless-f68892.es.us-east-1.aws.elastic.cloud/g"
Elastic-Agent in the absence of a port, appends the port in the end which by default is 9200. So the ES_HOST ends up https://katsoulis-serverless-f68892.es.us-east-1.aws.elastic.cloud:9200
This leads to connection refused. In order to overcome this, we need to modify the ES_HOST to
-e "s/%ES_HOST%/https:\/\/katsoulis-serverless-f68892.es.us-east-1.aws.elastic.cloud:443/g"
Elastic-Agent starts successfully and data are flowing
The first thing a user sees is a link to a dashboard which does not exist!
In discovery we can see metrics and logs
After some minutes we see the first restart of one of the agent's pods.
Reason is OOM killed
Conclusion:
Restarts:
As per my analysis and tests in https://github.com/elastic/elastic-agent/issues/4729#issuecomment-2355352224
in version 8.15.1 elastic-agent with Kubernetes and system integration needs more than 700Mb of memory.
So the limit is set low causing restarts.
Dashboard Should also Kubernetes Integration be installed under the hood which contains the assets?
ES_HOST We should always set the port of Elasticsearch because if not set, agent appends 9200.
Thanks for the investigation @MichaelKatsoulis !
Restarts: As per my analysis and tests in https://github.com/elastic/elastic-agent/issues/4729#issuecomment-2355352224 in version 8.15.1 elastic-agent with Kubernetes and system integration needs more than 700Mb of memory. So the limit is set low causing restarts.
I guess this is something that needs to be changed on the elastic-agent
side, right?
Should also Kubernetes Integration be installed under the hood which contains the assets?
Good catch, seems like the id of the dashboard changed in this PR: https://github.com/elastic/integrations/pull/10593 We should fix it short-term, but we need to think how we can make this whole process more stable.
We should always set the port of Elasticsearch because if not set, agent appends 9200.
I see, I think in a previous version it would append it, but the config value we pull this from changed. We can fix this on the Kibana side as well.
Following the Kubernetes onboarding flow on serverless (Add data > Monitor Infrastructure > Kubernetes) doesn't ship data. This can be reproduced on a serverless observability project and was tested with minikube running on Mac.
The logs show lots of errors like this:
It's possible this is a problem on the Kibana side in the flow as well, starting here for troubleshooting and we can move the issue in case it's unrelated.
A suspicion is that this is related to resourcing and the agent now needs more memory, but this needs to be confirmed.