elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.08k stars 4.89k forks source link

[Journald] input crashes with "failed to read message field: cannot allocate memory" #39352

Open belimawr opened 2 months ago

belimawr commented 2 months ago

Filebat: 8.13.2 Host OS: Amazon Linux 2 Systemd/Journald version: systemd 252 (252.16-1.amzn2023.0.2)

journalctl --version
systemd 252 (252.16-1.amzn2023.0.2)
+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP -GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 -BZIP2 -LZ4 +XZ -ZLIB -ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

How to reproduce

  1. Flood jounrald with logs so it it rotates logs every minute or so. Mostly follow https://github.com/elastic/beats/issues/34077#issuecomment-2018404571
  2. Start Filebeat with the config from the above link
  3. Wait until Journald reaches its maximum number of files and starts deleting old entries
  4. Filebeat might crash due to https://github.com/elastic/beats/issues/34077, it's ok. Ignore it
  5. Let the logs flowing for a while (I waited for hours)
  6. Start Filebeat again
  7. Journald input will fail with:
    {"log.level":"error","@timestamp":"2024-05-01T19:29:01.010Z","log.logger":"input.journald","log.origin":{"function":"github.com/elastic/beats/v7/filebeat/input/v2/compat.(*runner).Start.func1","file.name":"compat/compat.go","file.line":132},"message":"Input 'journald' failed with: input.go:130: input journald-input failed (id=journald-input)\n\tfailed to read message field: cannot allocate memory","service.name":"filebeat","id":"journald-input","ecs.version":"1.6.0"}

Sometimes Filebeat might just crash again. I also saw it failing once or twice with the same message as in https://github.com/elastic/beats/issues/32782.

Both seem to be relates with Filebeat being too far behind reading the journal, probably further behind than what journald has got stored in disk.

On both cases the error is coming from the Journald library we use, github.com/coreos/go-systemd/v22

elasticmachine commented 2 months ago

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)