elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.67k stars 8.23k forks source link

Inventory showing elasticagent name instead of actual cluster instance name #198751

Open sruthisattiraju opened 2 weeks ago

sruthisattiraju commented 2 weeks ago

Hi Team,

For the agents installed on Azure Kubernetes, Currently in the inventory list I see elastic agent names listing, I would like to see the actual workernode names listed. I see the host.name has elastic agent name: elastic-agent-dev-agent-zs9dl I need the host.name to be aks-apppool-85152817-vmss_105 which is mapped to cloud.instance.name.

Where do I make the configuration changes to make it reflect with cluster details? Please suggest

Image Image Image

elasticmachine commented 2 weeks ago

Pinging @elastic/fleet (Team:Fleet)

elasticmachine commented 2 weeks ago

Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)

cmacknz commented 2 weeks ago

Make sure you have hostNetwork: true in your Kubernetes YAML (see here). That will let agent see the node network and the node hostname.

sruthisattiraju commented 2 weeks ago

Hi,

I have updated the yaml with hostNetwork: true and applied it. But after the changes, agent pods are going into Crashloopback. Once I remove the new entry and apply it again, pods are running fine.

kubectl.exe logs -f elastic-agent-dev-agent-2676x Updating certificates in /etc/ssl/certs... rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL 1 added, 0 removed; done. Running hooks in /etc/ca-certificates/update.d... done. @.":"2024-11-06T06:56:40.428Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/cmd.(enrollCmd).enrollWithBackoff","file.name":"cmd/enroll_cmd.go","file.line":518},"message":"Starting enrollment to URL: https://fleet-server-dev-agent-http.elastic.svc:8220/","ecs.version":"1.6.0https://fleet-server-dev-agent-http.elastic.svc:8220/%22,%22ecs.version%22:%221.6.0"} **@.":"2024-11-06T06:56:40.710Z","log.logger":"transport","log.origin":{"function":"github.com/elastic/elastic-agent-libs/transport/httpcommon.(HTTPTransportSettings).RoundTripper.NetDialer.TestNetDialer.func3","file.name":"transport/tcp.go","file.line":53},"message":"DNS lookup failure \"fleet-server-dev-agent-http.elastic.svc\": lookup fleet-server-dev-agent-http.elastic.svc on 168.63.129.16:53: no such host","ecs.version":"1.6.0"} **@.**":"2024-11-06T06:56:40.710Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/cmd.(enrollCmd).enrollWithBackoff","file.name":"cmd/enroll_cmd.go","file.line":524},"message":"1st enrollment attempt failed, retrying enrolling to URL: https://fleet-server-dev-agent-http.elastic.svc:8220/ with exponential backoff (init 1s, max 10s)","ecs.version":"1.6.0"} Error: fail to enroll: fail to execute request to fleet-server: lookup fleet-server-dev-agent-http.elastic.svc on 168.63.129.16:53: no such host For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.15/fleet-troubleshooting.html Error: enrollment failed: exit status 1 For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.15/fleet-troubleshooting.html

I am attaching the yaml file I am using, request you to valid it and help me understand if I am entering it in right location.

Thanks & Regards Sruthi Sattiraju Technical Architect (Observability and Alerting) Open Group Certified Architect IBM Certified Technical Specialist Google Cloud Certified - Cloud Architect [A red text on a white background Description automatically generated] Vacation Alert:

From: Craig MacKenzie @.> Sent: 05 November 2024 03:31 To: elastic/kibana @.> Cc: Sruthi Sattiraju @.>; Author @.> Subject: [EXTERNAL] Re: [elastic/kibana] Inventory showing elasticagent name instead of actual cluster instance name (Issue #198751)

Make sure you have hostNetwork: true in your Kubernetes YAML (see here). That will let agent see the node network and the node hostname. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because

Make sure you have hostNetwork: true in your Kubernetes YAML (see herehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_elastic_elastic-2Dagent_blob_9d2c3c92069894db0bbea34a7c5ca66f4b632377_deploy_kubernetes_elastic-2Dagent-2Dstandalone_elastic-2Dagent-2Dstandalone-2Ddaemonset.yaml-23L26&d=DwMCaQ&c=cCoa5WWAB7EEETJScYfkXg&r=fZg2mq8DWtlEijzgDfeA_KvRZs9GNAKy9bgqLTTI1Kw&m=KCaw7NMJv_kGj8i8Kt6S_f6Z68LO0xPvZAOBjZJSxo9B3JwpPL01XxIPlSZZhr0h&s=p3zmMjq7CDDMV-24plJU3A_hPUqI0qbty6Vsql2sh_I&e=). That will let agent see the node network and the node hostname.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_elastic_kibana_issues_198751-23issuecomment-2D2455787137&d=DwMCaQ&c=cCoa5WWAB7EEETJScYfkXg&r=fZg2mq8DWtlEijzgDfeA_KvRZs9GNAKy9bgqLTTI1Kw&m=KCaw7NMJv_kGj8i8Kt6S_f6Z68LO0xPvZAOBjZJSxo9B3JwpPL01XxIPlSZZhr0h&s=EkKVQS92bozKUAuhkE8CnpyfiEIJLNVxjZaykskn_yE&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_BMG7UEJYF7ZAVTVRI4BBUXDZ67N7BAVCNFSM6AAAAABRDXVI2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJVG44DOMJTG4&d=DwMCaQ&c=cCoa5WWAB7EEETJScYfkXg&r=fZg2mq8DWtlEijzgDfeA_KvRZs9GNAKy9bgqLTTI1Kw&m=KCaw7NMJv_kGj8i8Kt6S_f6Z68LO0xPvZAOBjZJSxo9B3JwpPL01XxIPlSZZhr0h&s=s5OIRPyR7zV9-ZIQSvxYOr8NEka2jBhRVKcHBpWfZVo&e=. You are receiving this because you authored the thread.Message ID: @.***>

sruthisattiraju commented 2 weeks ago

Team, after I added hostNetwork: true, as informed agents were going into Crashloopback, so I reverted the yml back and applied again. Since then, agent on the kubernetes are failing to connect to fleet and elastic. Issue is only on Kuberenetes, whereas agents on VMs are able to send data to elastic. Some thing broke after applied the changed with hostNetwork, how do I fix it now.

I am getting below error in fleetserver logs:

{"log.level":"error","@timestamp":"2024-11-07T06:55:44.471Z","message":"http: TLS handshake error from 10.3.0.208:57024: read tcp 10.3.3.7:8220->10.3.0.208:57024: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"} {"log.level":"error","@timestamp":"2024-11-07T06:55:44.573Z","log.origin":{"function":"github.com/elastic/elastic-agent/internal/pkg/agent/application/coordinator.(*Coordinator).watchRuntimeComponents","file.name":"coordinator/coordinator.go","file.line":647},"message":"Component state changed filestream-monitoring (DEGRADED->FAILED): Failed: pid '72336' missed 3 check-ins and will be killed","log":{"source":"elastic-agent"},"component":{"id":"filestream-monitoring","state":"FAILED","old_state":"DEGRADED"},"ecs.version":"1.6.0"}

sruthisattiraju commented 1 week ago

Team, any update on this issue. I see below errors in fleet log. {"log.level":"error","@timestamp":"2024-11-11T06:50:07.387Z","message":"http: TLS handshake error from 10.3.0.179:18013: read tcp 10.3.2.43:8220->10.3.0.179:18013: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"} {"log.level":"error","@timestamp":"2024-11-11T06:50:07.387Z","message":"http: TLS handshake error from 10.3.2.151:59710: read tcp 10.3.2.43:8220->10.3.2.151:59710: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"} {"log.level":"error","@timestamp":"2024-11-11T06:50:07.788Z","message":"http: TLS handshake error from 10.3.0.120:57039: read tcp 10.3.2.43:8220->10.3.0.120:57039: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"service.type":"fleet-server","ecs.version":"1.6.0","service.name":"fleet-server","ecs.version":"1.6.0"} {"log.level":"error","@timestamp":"2024-11-11T06:50:08.388Z","message":"http: TLS handshake error from 10.3.1.142:42707: read tcp 10.3.2.43:8220->10.3.1.142:42707: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"} {"log.level":"error","@timestamp":"2024-11-11T06:50:08.889Z","message":"http: TLS handshake error from 10.3.0.33:13134: read tcp 10.3.2.43:8220->10.3.0.33:13134: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"} {"log.level":"error","@timestamp":"2024-11-11T06:50:08.988Z","message":"http: TLS handshake error from 10.3.2.33:44906: read tcp 10.3.2.43:8220->10.3.2.33:44906: i/o timeout\n","component":{"binary":"fleet-server","dataset":"elastic_agent.fleet_server","id":"fleet-server-default","type":"fleet-server"},"log":{"source":"fleet-server-default"},"ecs.version":"1.6.0","service.name":"fleet-server","service.type":"fleet-server","ecs.version":"1.6.0"}

But when I check the connection from the elastic agents to fleet, its seems to be fine.

kubectl.exe exec -it elastic-agent-dev-agent-rc26f -- bash root@elastic-agent-dev-agent-rc26f:/usr/share/elastic-agent# openssl s_client -connect fleet-server-dev-agent-http.elastic.svc:8220 -CAfile /mnt/elastic-internal/fleetserver-association/elastic/fleet-server-dev/certs/ca.crt CONNECTED(00000003) depth=1 OU = fleet-server-dev, CN = fleet-server-dev-http verify return:1 depth=0 OU = fleet-server-dev, CN = fleet-server-dev-agent-http.elastic.agent.local verify return:1

Certificate chain 0 s:OU = fleet-server-dev, CN = fleet-server-dev-agent-http.elastic.agent.local i:OU = fleet-server-dev, CN = fleet-server-dev-http 1 s:OU = fleet-server-dev, CN = fleet-server-dev-http i:OU = fleet-server-dev, CN = fleet-server-dev-http

Server certificate -----BEGIN CERTIFICATE----- MIIETDCCAzSgAwIBAgIRAMQ3grYkUNKJCvDcjm1VxbQwDQYJKoZIhvcNAQELBQAw OzEZMBcGA1UECxMQZmxlZXQtc2VydmVyLWRldjEeMBwGA1UEAxMVZmxlZXQtc2Vy

Can I get any help in fixing this issue? this all started after I include hostNetwork: true in the yaml file.