Open harshitgupta-qasource opened 8 months ago
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)
@amolnater-qasource Kindly review
Secondary Review for this ticket is Done.
It looks like we must have gotten a check-in that did not include the endpoint unit state:
We are missing the endpoint logs, which would be needed to debug this. That is a separate bug:
Hi @cmacknz
We have fetched the endpoint logs manually from the host, Please find attached below: log.zip
Thanks
It looks like the endpoint logs were captured after the problem occurred, the first timestamp in those logs is after the first set of problems started in the agent logs.
If you are able to reproduce this, the bug preventing endpoint logs from being included in diagnostics has been fixed now.
Hi @cmacknz
We have re-validated this issue on latest 8.13.0 BC4 Kibana Cloud build and found that the issue is still reproducible.
Please find attached the latest agent logs with endpoint below: elastic-agent-diagnostics-2024-03-14T09-13-58Z-00.zip
Build details: VERSION: 8.13.0 BC4 BUILD: 71907 COMMIT: a44ba7e18a1b025d98f9a789f8177c1f5774d04e Artifact: https://staging.elastic.co/8.13.0-6a2fcd0a/downloads/beats/elastic-agent/elastic-agent-8.13.0-windows-x86_64.zip
Please let us know if anything else is required from our end. Thanks
Thanks, but the endpoint logs missing was fixed in BC5, so in those diagnostics the endpoint logs are still missing.
Hi @cmacknz
Please find attached latest agent logs with elastic defend integration that validated on latest 8.13.0 BC5 kibana cloud build. elastic-agent-diagnostics-2024-03-19T11-15-34Z-00.zip
Build details: VERSION: 8.13.0 BC5 BUILD: 72025 COMMIT: d04713ae6c5fd43086bc488edd406ca50d4b9709 Artifact: https://staging.elastic.co/8.13.0-f75ccb77/summary-8.13.0.html#elastic-agent-package
Thanks
Thanks! Those have what I need.
Here's the first occurence in the agent logs:
{"log.level":"warn","@timestamp":"2024-03-19T06:35:00.843Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":606},"message":"Component state changed endpoint-default (HEALTHY->DEGRADED): Degraded: endpoint service missed 1 check-in","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"DEGRADED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-03-19T06:35:04.140Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":624},"message":"Unit state changed endpoint-default-06788e77-6fb9-447e-9814-5d05a97d8471 (HEALTHY->FAILED): Failed: not reported in check-in","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"DEGRADED"},"unit":{"id":"endpoint-default-06788e77-6fb9-447e-9814-5d05a97d8471","type":"input","state":"FAILED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-03-19T06:35:04.140Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":624},"message":"Unit state changed endpoint-default (HEALTHY->FAILED): Failed: not reported in check-in","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"DEGRADED"},"unit":{"id":"endpoint-default","type":"output","state":"FAILED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-03-19T06:35:07.600Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":624},"message":"Unit state changed endpoint-default-06788e77-6fb9-447e-9814-5d05a97d8471 (FAILED->HEALTHY): Applied policy {06788e77-6fb9-447e-9814-5d05a97d8471}","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"DEGRADED"},"unit":{"id":"endpoint-default-06788e77-6fb9-447e-9814-5d05a97d8471","type":"input","state":"HEALTHY","old_state":"FAILED"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-03-19T06:35:07.600Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":624},"message":"Unit state changed endpoint-default (FAILED->HEALTHY): Applied policy {06788e77-6fb9-447e-9814-5d05a97d8471}","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"DEGRADED"},"unit":{"id":"endpoint-default","type":"output","state":"HEALTHY","old_state":"FAILED"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-03-19T06:35:30.844Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":606},"message":"Component state changed endpoint-default (DEGRADED->HEALTHY): Healthy: communicating with endpoint service","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"HEALTHY","old_state":"DEGRADED"},"ecs.version":"1.6.0"}
Here are the endpoint logs around this time:
{"@timestamp":"2024-03-19T06:34:59.2030218Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":11436,"thread":{"id":2108}}}
{"@timestamp":"2024-03-19T06:35:00.217592Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":11436,"thread":{"id":2108}}}
{"@timestamp":"2024-03-19T06:35:01.2233155Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":11436,"thread":{"id":2108}}}
{"@timestamp":"2024-03-19T06:35:02.2344486Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":11436,"thread":{"id":2108}}}
{"@timestamp":"2024-03-19T06:35:03.2392269Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":11436,"thread":{"id":2108}}}
{"@timestamp":"2024-03-19T06:35:04.1375723Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":1048,"name":"AgentComms.cpp"}}},"message":"AgentComms.cpp:1048 Agent connection established.","process":{"pid":11436,"thread":{"id":11612}}}
{"@timestamp":"2024-03-19T06:35:04.1375723Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":56,"name":"BigChief.cpp"}}},"message":"BigChief.cpp:56 Agent process notification for protection: 5052","process":{"pid":11436,"thread":{"id":11612}}}
{"@timestamp":"2024-03-19T06:35:04.1375723Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":77,"name":"BigChief.cpp"}}},"message":"BigChief.cpp:77 Agent process [5052] protection result: Success","process":{"pid":11436,"thread":{"id":11612}}}
{"@timestamp":"2024-03-19T06:35:04.1375723Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":309,"name":"AgentContext.cpp"}}},"message":"AgentContext.cpp:309 Successfully created reader/writer for CheckinV2Api","process":{"pid":11436,"thread":{"id":10040}}}
{"@timestamp":"2024-03-19T06:35:04.1375723Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"warning","origin":{"file":{"line":478,"name":"AgentContext.cpp"}}},"message":"AgentContext.cpp:478 Endpoint is setting status to CONFIGURING, reason: Policy Application Status","process":{"pid":11436,"thread":{"id":10792}}}
{"@timestamp":"2024-03-19T06:35:04.142431Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":430,"name":"AgentV2PolicyBuilder.cpp"}}},"message":"AgentV2PolicyBuilder.cpp:430 Received new configuration from agent, building new config","process":{"pid":11436,"thread":{"id":10040}}}
It happens again here:
{"log.level":"warn","@timestamp":"2024-03-19T06:44:31.000Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":606},"message":"Component state changed endpoint-default (HEALTHY->DEGRADED): Degraded: endpoint service missed 1 check-in","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"DEGRADED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-03-19T06:44:40.120Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":624},"message":"Unit state changed endpoint-default-06788e77-6fb9-447e-9814-5d05a97d8471 (HEALTHY->FAILED): Failed: not reported in check-in","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"DEGRADED"},"unit":{"id":"endpoint-default-06788e77-6fb9-447e-9814-5d05a97d8471","type":"input","state":"FAILED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2024-03-19T06:44:40.120Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":624},"message":"Unit state changed endpoint-default (HEALTHY->FAILED): Failed: not reported in check-in","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"DEGRADED"},"unit":{"id":"endpoint-default","type":"output","state":"FAILED","old_state":"HEALTHY"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-03-19T06:44:44.318Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":624},"message":"Unit state changed endpoint-default-06788e77-6fb9-447e-9814-5d05a97d8471 (FAILED->HEALTHY): Applied policy {06788e77-6fb9-447e-9814-5d05a97d8471}","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"DEGRADED"},"unit":{"id":"endpoint-default-06788e77-6fb9-447e-9814-5d05a97d8471","type":"input","state":"HEALTHY","old_state":"FAILED"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-03-19T06:44:44.318Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":624},"message":"Unit state changed endpoint-default (FAILED->HEALTHY): Applied policy {06788e77-6fb9-447e-9814-5d05a97d8471}","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"DEGRADED"},"unit":{"id":"endpoint-default","type":"output","state":"HEALTHY","old_state":"FAILED"},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2024-03-19T06:45:01.001Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":606},"message":"Component state changed endpoint-default (DEGRADED->HEALTHY): Healthy: communicating with endpoint service","log":{"source":"elastic-agent"},"component":{"id":"endpoint-default","state":"HEALTHY","old_state":"DEGRADED"},"ecs.version":"1.6.0"}
The matching endpoint logs are:
{"@timestamp":"2024-03-19T06:44:27.7973601Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":406,"name":"Response.cpp"}}},"message":"Response.cpp:406 Policy action configure_response_actions: success - Successfully configured fleet API for response actions","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:27.7973601Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2425,"name":"Config.cpp"}}},"message":"Config.cpp:2425 Configuring metadata","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:27.7973601Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2425,"name":"Config.cpp"}}},"message":"Config.cpp:2425 Configuring cloudServices","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:27.8135229Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:28.8348852Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:29.8352501Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:30.8406001Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:31.8550782Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:32.7353612Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":572,"name":"LogStorage.cpp"}}},"message":"LogStorage.cpp:572 Unsynced document log store: C:\\Program Files\\Elastic\\Endpoint\\state\\documents\\documents-2024-03-19T063449.log","process":{"pid":10052,"thread":{"id":2248}}}
{"@timestamp":"2024-03-19T06:44:32.8710733Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:33.8807365Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:34.8879664Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:35.8966438Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:36.9029597Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:37.760251Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":572,"name":"LogStorage.cpp"}}},"message":"LogStorage.cpp:572 Unsynced document log store: C:\\Program Files\\Elastic\\Endpoint\\state\\documents\\documents-2024-03-19T063449.log","process":{"pid":10052,"thread":{"id":2248}}}
{"@timestamp":"2024-03-19T06:44:37.9042636Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:38.9114662Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:39.9250488Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":2511,"name":"Config.cpp"}}},"message":"Config.cpp:2511 Checking for agent connectivity","process":{"pid":10052,"thread":{"id":2168}}}
{"@timestamp":"2024-03-19T06:44:40.1179011Z","agent":{"id":"998d041a-b62b-4fd7-8fda-945a3d40cd52","type":"endpoint"},"ecs":{"version":"8.10.0"},"log":{"level":"info","origin":{"file":{"line":1048,"name":"AgentComms.cpp"}}},"message":"AgentComms.cpp:1048 Agent connection established.","process":{"pid":10052,"thread":{"id":11028}}}
I see endpoint reconnected to agent 26 times in the endpoint logs which span ~5 hours from 2024-03-19T06:23:00.684
to 2024-03-19T11:15:34.826
. That seems suspicious.
❯ rg 'Connecting to Agent' logs/services --count
logs/services/endpoint-000000.log:26
@brian-mckinney any idea what might be going on here on the endpoint side? Diagnostics and logs are in https://github.com/elastic/elastic-agent/issues/4370#issuecomment-2006970839.
Kibana Build details:
Host OS and Browser version: All, All
Preconditions:
Steps to reproduce:
Expected:
Screenshot:
Agents Logs: elastic-agent-diagnostics-2024-03-07T09-29-57Z-00.zip