elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
115 stars 129 forks source link

[Fleet] Agent fails when host disk space is full, need better support #293

Closed EricDavisX closed 1 month ago

EricDavisX commented 3 years ago

A user brought this to us and I am logging a quick ticket to capture minimal details.

Apparently the Agent failed to install metricbeat, due to a lack of disk space.

It wasn't clear immediately, but some subsequent log diving shows the reason:

/var/lib/elastic-agent/logs/elastic-agent-json.log.2:{"log.level":"error","@timestamp":"2020-11-11T11:13:12.997-0500","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2020-11-11T11:13:12-05:00: type: 'ERROR': sub_type: 'FAILED' message: Application: filebeat--7.9.3--36643631373035623733363936343635[2ff0699f-4ef0-4d57-84b3-053a760c711e]: State changed to FAILED: TarInstaller: error writing to /var/lib/elastic-agent/install/filebeat-7.9.3-linux-x86_64/NOTICE.txt: write /var/lib/elastic-agent/install/filebeat-7.9.3-linux-x86_64/NOTICE.txt: no space left on device","ecs.version":"1.5.0"}

What should we expect of Elastic Agent here? Not sure what it can do... except to purge old log files? what else can we think? and what should be shown in the Activity log, etc?

Thanks @P1llus for bringing it to us in slack

elasticmachine commented 3 years ago

Pinging @elastic/ingest-management (Team:Ingest Management)

EricDavisX commented 3 years ago

comments from slack: 1) This happened to me several times over the past few days. I “think”, it is unable to track it’s state due to unable to write the file, and it just goes crazy I had to reinstall everytime this happend to me

2) It might be that it needs to enforce failure earlier in the enrollment process and report failure instead of success? I don't think out of disk space is the only usecase this might hit could be write permissions as well

ph commented 3 years ago

As @michalpristas mentioned I think the no space left on device message should be show on fleet and on the agent log? Is this not the case?

From what I understand Elastic Agent succeeded to install but is not able to to boostrap filebeat.

elasticmachine commented 3 years ago

Pinging @elastic/agent (Team:Agent)

jlind23 commented 1 month ago

Closing this as outdated / not relevant anymore. In the meantime we are working on a lightweight Elastic Agent binary to minimize the occurences of such scenarios. cc @nimarezainia