Err: Registry already stopped prevents harvester from starting on existing file

fdartayre commented 2 years ago

Filebeat 7.16.3 with following autodiscover configuration:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      scope: node
      templates:
        - config:
            - type: container
              paths:
              - /var/log/containers/*-${data.kubernetes.container.id}.log  
              clean_inactive: 72h
              close_inactive: 15m
              close_timeout: 24h
              close_renamed: false
              close_removed: false
              clean_removed: false
              ignore_older: 48h
              harvester_buffer_size: 65536
              max_bytes: 1048576
              # End default_config
              add_resource_metadata:
                namespace:
                  enabled: true
            - type: log
              paths:
              - /var/log/messages
              clean_inactive: 72h
              close_inactive: 15m
              close_timeout: 24h
              close_renamed: false
              close_removed: false
              clean_removed: false
              ignore_older: 48h
              harvester_buffer_size: 65536
              max_bytes: 1048576

sporadically prints an error Harvester could not be started on existing file due to Err: registry already stopped:

2022-05-13T06:20:42.163Z  INFO  [input] log/input.go:171  Configured paths: [/var/log/containers/*-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log]  {"input_id": "a1036cef-c8e6-52fd-832f-81ef84ab3e45"}
2022-05-13T06:20:42.164Z  ERROR [input] log/input.go:582  Harvester could not be started on existing file: /var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log, Err: registry already stopped  {"input_id": "a1036cef-c8e6-52fd-832f-81ef84ab3e45", "source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "state_id": "native::44397250-66305", "finished": false, "os_id": "44397250-66305", "old_source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "old_finished": true, "old_os_id": "44397250-66305"}
2022-05-13T06:20:42.165Z  INFO  input/input.go:136  input ticker stopped
2022-05-13T06:20:42.206Z  INFO  [input] log/input.go:171  Configured paths: [/var/log/containers/*-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log]  {"input_id": "638df06e-d213-4732-8654-b6f9b9c69610"}
2022-05-13T06:20:42.247Z  INFO  [input] log/input.go:171  Configured paths: [/var/log/containers/*-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log]  {"input_id": "f75bb80a-3c0c-4b2c-a324-3836c4d6d590"}
2022-05-13T06:20:42.248Z  INFO  [input.harvester] log/harvester.go:309  Harvester started for file. {"input_id": "f75bb80a-3c0c-4b2c-a324-3836c4d6d590", "source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "state_id": "native::44397250-66305", "finished": false, "os_id": "44397250-66305", "old_source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "old_finished": true, "old_os_id": "44397250-66305", "harvester_id": "1ae57f03-6766-4fb0-a66a-6e0a2e4998fe"}
2022-05-13T06:20:47.066Z  INFO  input/input.go:136  input ticker stopped
2022-05-13T06:20:47.066Z  INFO  [input.harvester] log/harvester.go:336  Reader was closed. Closing. {"input_id": "f75bb80a-3c0c-4b2c-a324-3836c4d6d590", "source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "state_id": "native::44397250-66305", "finished": false, "os_id": "44397250-66305", "old_source": "/var/log/containers/thanos-addon-compactor-984564849-8zkb7_thanos_compactor-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log", "old_finished": true, "old_os_id": "44397250-66305", "harvester_id": "1ae57f03-6766-4fb0-a66a-6e0a2e4998fe"}
2022-05-13T06:20:57.193Z  INFO  [input] log/input.go:171  Configured paths: [/var/log/containers/*-c6355ba659b77c2cff06f7133102b0350a117249ec0627be2ac2d16b8a05621f.log]  {"input_id": "a8f94e7e-da13-4dff-9565-8c31913207fa"}

This does not lead to data loss. @ChrsMark suggested to create a public GH issue to have someone from the team to investigate this further and see if we can improve anything most probably in the messaging/error-handling so as to avoid such confusion in the future.

bd67 commented 1 year ago

In my case the filebeat service stops after encountering these errors. I have to manually start the service. Is there any solution workaround fro this ?

mhadidg commented 1 year ago

I see a lot of these errors in my deployment. I suspect that this particular error causes data loss in my deployment due to Harvester's failure to monitor some files!

I have tried to play with Filebeat configuration for quite some time to fix this, without any luck!

Version

ECK: 2.4.0
Elastic Stack: 8.4.1

Error Logs

{"log.level":"error","@timestamp":"2022-10-05T20:06:07.568Z","log.logger":"input","log.origin":{"file.name":"log/input.go","file.line":583},"message":"Harvester could not be started on existing file: /var/lib/docker/containers/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2-json.log, Err: registry already stopped","service.name":"filebeat","input_id":"d13881e6-df14-458a-b5eb-ebb827fbad2b","source_file":"/var/lib/docker/containers/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2-json.log","state_id":"native::2179301-2048","finished":false,"os_id":"2179301-2048","old_source":"/var/lib/docker/containers/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2-json.log","old_finished":true,"old_os_id":"2179301-2048","ecs.version":"1.6.0"}
---
{"log.level":"error","@timestamp":"2022-10-05T19:53:48.469Z","log.logger":"input","log.origin":{"file.name":"log/input.go","file.line":558},"message":"Harvester could not be started on new file: /var/log/containers/elastic-operator-0_logging_manager-dc64f4b6972377cf0c4eb3296ead7de731dbff2a65eaf86f59e1821d435ea807.log, Err: registry already stopped","service.name":"filebeat","input_id":"8bf88eb0-3d81-4f79-b9df-4fa12104068d","source_file":"/var/log/containers/elastic-operator-0_logging_manager-dc64f4b6972377cf0c4eb3296ead7de731dbff2a65eaf86f59e1821d435ea807.log","state_id":"native::2565589-2048","finished":false,"os_id":"2565589-2048","ecs.version":"1.6.0"}

boernd commented 1 year ago

We also have this issue, that sometimes (interestingly not always) when this error appears the forwarding of specific pod logs stops but logs of other pods which are processed by the same filebeat still work and are forwarded. We see in the harvester metrics of filebeat that there is a difference between open_files and running state.

nuno-silva commented 1 year ago

I'm seeing this error using filebeat 8.4.2 / k8s 1.22. It happened on a pod that ran for only 2 seconds.

{
  "log.level": "error",
  "@timestamp": "2023-02-22T00:15:03.514Z",
  "log.logger": "input",
  "log.origin": {
    "file.name": "log/input.go",
    "file.line": 558
  },
  "message": "Harvester could not be started on new file: /var/log/containers/xxxxx-cj-xxxxx-27950415-kgd2s_xxxxyyyyy-035eb9009eccf1e53bd225023373b175f8fbea147bc964b6b7482c3ca3cb533f.log, Err: registry already stopped",
  "service.name": "filebeat",
  "input_id": "f586cc7c-c1bc-43c9-beaf-066fc480fce5",
  "source_file": "/var/log/containers/xxxxx-cj-xxxxx-27950415-kgd2s_xxxxyyyyy-035eb9009eccf1e53bd225023373b175f8fbea147bc964b6b7482c3ca3cb533f.log",
  "state_id": "native::30490683-66305",
  "finished": false,
  "os_id": "30490683-66305",
  "ecs.version": "1.6.0"
}

rchaudhary24 commented 1 year ago

Hi, I am receiving the same error for filebeat version 7.13.1. Is there any fix for this issue ?

2023-03-24T00:36:37.446Z        INFO    log/input.go:157        Configured paths: [/var/log/containers/*13c60d48764517ea78ed929f8404a616d2b19e3d3e1c878e56c63ce1a2ebdbf2.log]
2023-03-24T00:36:44.090Z        ERROR   log/input.go:554        Harvester could not be started on existing file: /var/log/containers/usage-collector-parser-service-86cb7b6cc7-lq2ln_usage-collector-prod-au_usage-collector-parser-service-69dcac15057350e89a6cc13c4a5c2cd76813151bbf41e45b7e31a32137f0438b.log, Err: registry already stopped
2023-03-24T00:36:44.090Z        INFO    input/input.go:136      input ticker stopped
2023-03-24T00:36:44.090Z        ERROR   log/input.go:554        Harvester could not be started on existing file: /var/log/containers/usage-collector-parser-service-86cb7b6cc7-lq2ln_usage-collector-prod-au_istio-proxy-aa7b992875c1194f38933cd870e346c5843c0e98bbf7e36f93512d8ab4f79f52.log, Err: registry already stopped
2023-03-24T00:36:44.090Z        INFO    input/input.go:136      input ticker stopped
2023-03-24T00:36:44.090Z        INFO    input/input.go:136      input ticker stopped

todeb commented 10 months ago

I have same issue, using: docker.elastic.co/beats/filebeat:8.3.3

{"log":"{\"log.level\":\"error\",\"@timestamp\":\"2023-08-16T14:42:59.576Z\",\"log.logger\":\"input\",\"log.origin\":{\"file.name\":\"log/input.go\",\"file.line\":583},\"message\":\"Harvester could not be started on existing file: /var/lib/docker/containers/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8-json.log, Err: registry already stopped\",\"service.name\":\"filebeat\",\"input_id\":\"4cbb1204-bfd7-48f5-9db3-82ea899ebb36\",\"source_file\":\"/var/lib/docker/containers/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8-json.log\",\"state_id\":\"native::5767743-64768\",\"finished\":false,\"os_id\":\"5767743-64768\",\"old_source\":\"/var/lib/docker/containers/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8-json.log\",\"old_finished\":true,\"old_os_id\":\"5767743-64768\",\"ecs.version\":\"1.6.0\"}\n","stream":"stderr","time":"2023-08-16T14:42:59.576491143Z"}

It cause the logs are not shipped and lost.

@tetianakravchenko any update on that issue?

PatrikPM commented 1 month ago

Getting the exact same thing. Filebeat sporadically stops processing some pod logs, but others on the same node works well. Restarting the filebeat Daemonset solves the issue.

PatrikPM commented 1 month ago

After upgrading to Filebeat 8.11.3 this was solved for me at least.

elastic / beats

Err: Registry already stopped prevents harvester from starting on existing file #31734

Version

Error Logs