elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
103 stars 4.92k forks source link

Filebeat stops harvesting logs #13607

Open cdalexndr opened 5 years ago

cdalexndr commented 5 years ago
  1. Filebeat harvested logs correctly
  2. Restarted logstash
  3. New logs appear in Kibana
  4. After some time, logs are not being harvested from a docker container, although new log entries are written to file

Note that logstash correctly receives logs from other containers harvested by filebeat. Filebeat harvester didn't recover even after a few hours.

Filebeat configuration:

output.logstash:
  hosts: ["logstash:5044"]
setup.kibana.host: http://kibana:5601
filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true

Docker compose service:

web:
    image: app:latest
    expose:
      - 80
    labels:
      co.elastic.logs/multiline.pattern: '^[[:space:]]+(at|\\.{3})\\b|^Caused by:'
      co.elastic.logs/multiline.negate: false
      co.elastic.logs/multiline.match: after
      app: web
    environment:
      ELASTIC_APM_SERVER_URLS: http://apm:8200

My docker container that is not being harvested has id: 3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c

This container was also restarted at 2019-09-11 15:19:03,983. Last log entry correctly processed has timestamp 2019-09-11 15:19:37,235. Next log entry with timestamp 2019-09-11 15:20:15,276 and next ones are missing.

Filebeat relevant log:

2019-09-11T15:19:57.988Z        INFO    [monitoring]    log/log.go:145  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":28150,"time":{"ms":120}},"total":{"ticks":54510,"time":{"ms":150},"value":54510},"user":{"ticks":26360,"time":{"ms":30}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":13},"info":{"ephemeral_id":"17c964bc-009e-4840-89b6-4b9c511cdf14","uptime":{"ms":15900511}},"memstats":{"gc_next":8939216,"memory_alloc":4663776,"memory_total":6966547888},"runtime":{"goroutines":93}},"filebeat":{"events":{"active":-2,"added":114,"done":116},"harvester":{"open_files":5,"running":5}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":116,"batches":13,"total":116},"read":{"bytes":78},"write":{"bytes":31559}},"pipeline":{"clients":8,"events":{"active":0,"published":114,"total":114},"queue":{"acked":116}}},"registrar":{"states":{"current":10,"update":116},"writes":{"success":13,"total":13}},"system":{"load":{"1":1.19,"15":0.58,"5":0.86,"norm":{"1":0.2975,"15":0.145,"5":0.215}}}}}}
2019-09-11T15:20:04.554Z        INFO    input/input.go:149      input ticker stopped
2019-09-11T15:20:04.554Z        INFO    input/input.go:167      Stopping Input: 9894227653855818847
2019-09-11T15:20:04.554Z        INFO    log/harvester.go:274    Reader was closed: /var/lib/docker/containers/3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c/3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c-json.log. Closing.
2019-09-11T15:20:27.995Z        INFO    [monitoring]    log/log.go:145  Non-zero metrics in the last 30s        {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":28160,"time":{"ms":10}},"total":{"ticks":54610,"time":{"ms":100},"value":54610},"user":{"ticks":26450,"time":{"ms":90}}},"handles":{"limit":{"hard":1048576,"soft":1048576},"open":12},"info":{"ephemeral_id":"17c964bc-009e-4840-89b6-4b9c511cdf14","uptime":{"ms":15930517}},"memstats":{"gc_next":5878912,"memory_alloc":7557664,"memory_total":6986384152,"rss":-2998272},"runtime":{"goroutines":82}},"filebeat":{"events":{"added":31,"done":31},"harvester":{"closed":1,"open_files":4,"running":4}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":30,"batches":12,"total":30},"read":{"bytes":72},"write":{"bytes":17806}},"pipeline":{"clients":7,"events":{"active":0,"filtered":1,"published":30,"total":31},"queue":{"acked":30}}},"registrar":{"states":{"current":10,"update":31},"writes":{"success":13,"total":13}},"system":{"load":{"1":1.24,"15":0.6,"5":0.89,"norm":{"1":0.31,"15":0.15,"5":0.2225}}}}}}

Logstash restarted timeline log:

[2019-09-11T15:16:36,298][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"7.3.0"}
..................
[2019-09-11T15:16:45,397][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

Filebeat repository entry:

  {
    "source": "/var/lib/docker/containers/3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c/3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c-json.log",
    "offset": 794342,
    "timestamp": "2019-09-11T15:20:04.5542604Z",
    "ttl": -1,
    "type": "container",
    "meta": null,
    "FileStateOS": {
      "inode": 802588,
      "device": 2049
    }
  }

Log file stats:

sh-4.2# stat 3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c-json.log
  File: '3252b7646a23293b6728941769f0412e2bd4b74b801ee09ab747c7cdfa74550c-json.log'
  Size: 911932          Blocks: 1784       IO Block: 4096   regular file
Device: 801h/2049d      Inode: 802588      Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-09-11 17:39:48.711393000 +0000
Modify: 2019-09-11 17:19:37.112086700 +0000
Change: 2019-09-11 17:19:37.112086700 +0000
 Birth: -

Hope I provided enough details to investigate this problem.


Elastic stack 7.3.0

cdalexndr commented 5 years ago

After restarting filebeat docker service, logs are populated correctly. Filebeat should have also harvested logs without needing a restart.

r2r200 commented 4 years ago

I have the same issue on versions from 7.2 till 7.6.2. I have 3 similar containers with filebeat in nomad. 1 of them has high load (~5000 messages per minute). And it stucks once a day. Reboot of filebeat-container helps.

henriqueyun commented 4 years ago

I am also facing this problem with filebeat 7.0.1, it runs and harvest for a while and eventually stops with those messages.

Zstro commented 4 years ago

And I am alse facing the same problem with filebeat 7.7.1,it‘’s harvested logs after restart,and only once

nerddelphi commented 4 years ago

I'm facing the same issue since 7.6.1, 7.8.1, 7.8.2-SNAPSHOT and now with 7.9.0, even after a lot of PRs trying to fix that.

My issue is related to k8s autodiscover.

pauvos commented 4 years ago

Happens to me as well (7.9.0, docker autodiscover). Filebeat closes and reopens the missing container log file serveral times due to inactivity and then simply stops watching the file after a while.

Just curios, do you guys have logrotation enabled for docker (I do...)?

$ cat /etc/docker/daemon.json 
{
"log-driver": "json-file",
    "log-opts": {
        "max-size": "100m", 
        "max-file": "3"
    }
}
elasticmachine commented 4 years ago

Pinging @elastic/integrations-platforms (Team:Platforms)

jsoriano commented 4 years ago

Original issue reported in 7.3.0, of inputs incorrectly stopped on container restarts, was fixed by #13127 in 7.4.0.

jsoriano commented 4 years ago

@pauvos could you share the autodiscover configuration you are using?

cdalexndr commented 3 years ago

Just happened again on 7.10.1

I'm no longer using logstash, and the problem was with the nginx (swag) container. Logs are stored in files in container, and I attach a volume to filebeat so that they can be accessed by filebeat. Paths are correct, and files exist, just they aren't harvested. After restart, missing logs are populated.

My config:

output.elasticsearch:
  hosts: ["http://elasticsearch:9200"]
  index: "log-%{[docker][container][labels][app]}"
  pipeline: pipeline_logs
  ilm.enabled: false

filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true
      templates:
        - condition:
            contains:
              docker.container.image: swag
          config:
            - module: nginx
              access:
                enabled: true
                var.paths: [ "/config/log/nginx/access.log*" ]
              error:
                enabled: true
                var.paths: [ "/config/log/nginx/error.log*" ]
Majroch commented 2 years ago

I have the same problem, but with Elastic Agent and Filebeat managed by Fleet server. After some time (randomly, from time to time), filebeat just stops sending logs to my ElasticSearch cluster.

I'm using:

All configuration to every Elastic Agent on all servers i'm performing through Fleet Management tab inside Kibana. In log files from filebeat, under /opt/Elastic/Agent/data/elastic-agent-XXXXXX/logs/default/filebeat-json.log* directory, i have logs from harvester a bunch of time:

{"log.level":"info","@timestamp":"2022-01-13T06:00:24.459+0100","log.logger":"input.harvester","log.origin":{"file.name":"log/harvester.go","file.line":336},"message":"Reader was closed. Closing.","service.name":"filebeat","input_id":"6f476387-59f8-4ccc-8baa-38f0c646fd3d","source":"/var/log/apt/history.log.1.gz","state_id":"native::142434-64768","finished":false,"os_id":"142434-64768","old_source":"/var/log/apt/history.log.1.gz","old_finished":true,"old_os_id":"142434-64768","harvester_id":"4a471e46-b22a-42af-8d83-df53309cc81f","ecs.version":"1.6.0"}

Until today i've just reinstalled the Elastic Agents with ansible playbook when that happened.

Avers90 commented 2 years ago

I had a similar issue. It helped me to reduce the logging level to warning:

    logging:
      level: warning
      metrics.enabled: false
MonicaMagoniCom commented 1 year ago

I'm experiencing the same issue with filebeat 8.4.3 with Kubernetes as autodiscover provider, are there any updates?

euskadi31 commented 1 year ago

We have the same problem with autodiscover, filebeat is installed on the host machine and searches in the container logs

filebeat version 7.16.2 (amd64), libbeat 7.16.2 [3c518f4d17a15dc85bdd68a5a03d5af51d9edd8e built 2021-12-18 21:04:19 +0000 UTC]

MonicaMagoniCom commented 1 year ago

I had a similar issue. It helped me to reduce the logging level to warning:

    logging:
      level: warning
      metrics.enabled: false

This does not solve the issue

belimawr commented 1 year ago

This seems to be related to https://github.com/elastic/beats/issues/34388, probably the same issue caused by autodiscover with Kubernetes provider.

@MonicaMagoniCom Could you provide your configuration and some debug logs? A logging configuration like this should give enough information (I'm making some assumptions about your configuration here) and will not log any event ingested by Filebeat.

logging:
  level: debug
  selectors:
    - autodiscover
    - autodiscover.bus-filebeat
    - autodiscover.pod
    - beat
    - cfgwarn
    - crawler
    - hints.builder
    - input
    - input.filestream
    - input.harvester
    - kubernetes
    - modules
    - seccomp
    - service
belimawr commented 1 year ago

Regarding update on this issue, we are aware of issues with autodiscover like the one I linked above and it is on our backlog.

MonicaMagoniCom commented 1 year ago

We removed the use of autodiscover Kubernetes in our filebeat configuration, since we were experiencing the issue. We replaced it with filebeat inputs and it is working correctly. So yes, the issue seems to be releated to autodiscover Kubernetes provider.

euskadi31 commented 1 year ago

We have the same issue with docker autodiscover

belimawr commented 1 year ago

@euskadi31

We have the same issue with docker autodiscover

Could you provide some debug logs following this configuration:

logging:
  level: debug
  selectors:
    - autodiscover
    - autodiscover.bus-filebeat
    - autodiscover.pod
    - beat
    - cfgwarn
    - crawler
    - hints.builder
    - input
    - input.filestream
    - input.harvester
    - kubernetes
    - docker
    - modules
    - seccomp
    - service

Or at least look your debug logs and see if you find a message like this:

Error creating runner from config: failed to create input: Can only start an input when all related states are finished: {Id: ea745ab688be85a9-native::3884203-2049, Finished: false, Fileinfo: &{frontend-86c8579b5b-mhnpg_helpdesk-frontend_frontend-mgmt-1cc73434a92abe9b93d9a3d971cfc4182e8ce64ac0e03f0c6e395875236fd514.log 374 416 {204820038 63804978609 0x56347552d700} {2049 3884203 1 33184 0 0 0 0 374 4096 8 {1669381808 728813408} {1669381809 204820038} {1669381809 204820038} [0 0 0]}}, Source: /var/log/containers/frontend-86c8579b5b-mhnpg_helpdesk-frontend_frontend-mgmt-1cc73434a92abe9b93d9a3d971cfc4182e8ce64ac0e03f0c6e395875236fd514.log, Offset: 0, Timestamp: 2023-01-19 13:38:27.166489276 +0000 UTC m=+58865.698641043, TTL: -1ns, Type: container, Meta: map[stream:stdout], FileStateOS: 3884203-2049}
MonicaMagoniCom commented 1 year ago

Yes, when we had the problem we were seeing this error with debug logs.

euskadi31 commented 1 year ago

@belimawr yes i will watch it next week and i will tell you

toms-place commented 1 year ago

Hi is there any progress?

We are using beats version 8.4.3 and experiencing the same problem. We have a lot of messages regarding close_eof: End of file reached. Closing because close_eof is enabled. Although we already manually set it to false..

belimawr commented 1 year ago

@toms-place the issue I mentioned (https://github.com/elastic/beats/issues/34388) has been fixed in v8.8.2. Could you update to the latest version and verify if the issue still persists?

Regarding the close_eof, are you using log input, container input or filestream input? The log and container input use the close_eof configuration key (docs), however the filestream input uses close.reader.on_eof (docs).

gaima8 commented 1 year ago

We're a paying customer and have a support ticket open with elastic. We've been running 8.8.2 for a while now, sent debug logs from 8.8.2 about a month ago, and still have the problem. Today we upgraded to 8.10.2.

toms-place commented 1 year ago

@belimawr -> I will report back, when we updated our prod systems.

regarding close_eof -> we are currently using container input.

gaima8 commented 1 year ago

FWIW we've been on filebeat 8.10.2 for a month now. Haven't seen any instances of harvesting stopping. It hasn't been an actual solid month of pods running without the harvesting of their logs stopping, as the pods most obviously affected get updated from time to time, but, in the last week for example, nothing lost.

belimawr commented 1 year ago

Only with debug logs I'll be able do dig into it. A few things that can cause Filebeat to miss some logs:

  1. Log rotation happens too fast and close.on_state_change.removed (for filestream or the equivalent for the log/container input) is enabled.
  2. inode reuse. If log rotation occurs in a way the inode is re-used Filebeat can start reading the new file from the wrong offset. The fingerprint file identity can help with that.
botelastic[bot] commented 1 week ago

Hi! We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!