Open fdartayre opened 2 years ago
In my case the filebeat service stops after encountering these errors. I have to manually start the service. Is there any solution workaround fro this ?
I see a lot of these errors in my deployment. I suspect that this particular error causes data loss in my deployment due to Harvester's failure to monitor some files!
I have tried to play with Filebeat configuration for quite some time to fix this, without any luck!
{"log.level":"error","@timestamp":"2022-10-05T20:06:07.568Z","log.logger":"input","log.origin":{"file.name":"log/input.go","file.line":583},"message":"Harvester could not be started on existing file: /var/lib/docker/containers/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2-json.log, Err: registry already stopped","service.name":"filebeat","input_id":"d13881e6-df14-458a-b5eb-ebb827fbad2b","source_file":"/var/lib/docker/containers/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2-json.log","state_id":"native::2179301-2048","finished":false,"os_id":"2179301-2048","old_source":"/var/lib/docker/containers/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2/6c8b07114a2fba7e87e31eab2c0f33e475f44371baf3fece3eeff9afe1e37de2-json.log","old_finished":true,"old_os_id":"2179301-2048","ecs.version":"1.6.0"}
---
{"log.level":"error","@timestamp":"2022-10-05T19:53:48.469Z","log.logger":"input","log.origin":{"file.name":"log/input.go","file.line":558},"message":"Harvester could not be started on new file: /var/log/containers/elastic-operator-0_logging_manager-dc64f4b6972377cf0c4eb3296ead7de731dbff2a65eaf86f59e1821d435ea807.log, Err: registry already stopped","service.name":"filebeat","input_id":"8bf88eb0-3d81-4f79-b9df-4fa12104068d","source_file":"/var/log/containers/elastic-operator-0_logging_manager-dc64f4b6972377cf0c4eb3296ead7de731dbff2a65eaf86f59e1821d435ea807.log","state_id":"native::2565589-2048","finished":false,"os_id":"2565589-2048","ecs.version":"1.6.0"}
We also have this issue, that sometimes (interestingly not always) when this error appears the forwarding of specific pod logs stops but logs of other pods which are processed by the same filebeat still work and are forwarded.
We see in the harvester metrics of filebeat that there is a difference between open_files
and running
state.
I'm seeing this error using filebeat 8.4.2 / k8s 1.22. It happened on a pod that ran for only 2 seconds.
{
"log.level": "error",
"@timestamp": "2023-02-22T00:15:03.514Z",
"log.logger": "input",
"log.origin": {
"file.name": "log/input.go",
"file.line": 558
},
"message": "Harvester could not be started on new file: /var/log/containers/xxxxx-cj-xxxxx-27950415-kgd2s_xxxxyyyyy-035eb9009eccf1e53bd225023373b175f8fbea147bc964b6b7482c3ca3cb533f.log, Err: registry already stopped",
"service.name": "filebeat",
"input_id": "f586cc7c-c1bc-43c9-beaf-066fc480fce5",
"source_file": "/var/log/containers/xxxxx-cj-xxxxx-27950415-kgd2s_xxxxyyyyy-035eb9009eccf1e53bd225023373b175f8fbea147bc964b6b7482c3ca3cb533f.log",
"state_id": "native::30490683-66305",
"finished": false,
"os_id": "30490683-66305",
"ecs.version": "1.6.0"
}
Hi, I am receiving the same error for filebeat version 7.13.1. Is there any fix for this issue ?
2023-03-24T00:36:37.446Z INFO log/input.go:157 Configured paths: [/var/log/containers/*13c60d48764517ea78ed929f8404a616d2b19e3d3e1c878e56c63ce1a2ebdbf2.log]
2023-03-24T00:36:44.090Z ERROR log/input.go:554 Harvester could not be started on existing file: /var/log/containers/usage-collector-parser-service-86cb7b6cc7-lq2ln_usage-collector-prod-au_usage-collector-parser-service-69dcac15057350e89a6cc13c4a5c2cd76813151bbf41e45b7e31a32137f0438b.log, Err: registry already stopped
2023-03-24T00:36:44.090Z INFO input/input.go:136 input ticker stopped
2023-03-24T00:36:44.090Z ERROR log/input.go:554 Harvester could not be started on existing file: /var/log/containers/usage-collector-parser-service-86cb7b6cc7-lq2ln_usage-collector-prod-au_istio-proxy-aa7b992875c1194f38933cd870e346c5843c0e98bbf7e36f93512d8ab4f79f52.log, Err: registry already stopped
2023-03-24T00:36:44.090Z INFO input/input.go:136 input ticker stopped
2023-03-24T00:36:44.090Z INFO input/input.go:136 input ticker stopped
I have same issue, using: docker.elastic.co/beats/filebeat:8.3.3
{"log":"{\"log.level\":\"error\",\"@timestamp\":\"2023-08-16T14:42:59.576Z\",\"log.logger\":\"input\",\"log.origin\":{\"file.name\":\"log/input.go\",\"file.line\":583},\"message\":\"Harvester could not be started on existing file: /var/lib/docker/containers/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8-json.log, Err: registry already stopped\",\"service.name\":\"filebeat\",\"input_id\":\"4cbb1204-bfd7-48f5-9db3-82ea899ebb36\",\"source_file\":\"/var/lib/docker/containers/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8-json.log\",\"state_id\":\"native::5767743-64768\",\"finished\":false,\"os_id\":\"5767743-64768\",\"old_source\":\"/var/lib/docker/containers/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8/810632c43f0f078c864aeee76b575e2f63ec55f0b904e7c51db44138cab3d6b8-json.log\",\"old_finished\":true,\"old_os_id\":\"5767743-64768\",\"ecs.version\":\"1.6.0\"}\n","stream":"stderr","time":"2023-08-16T14:42:59.576491143Z"}
It cause the logs are not shipped and lost.
@tetianakravchenko any update on that issue?
Getting the exact same thing. Filebeat sporadically stops processing some pod logs, but others on the same node works well. Restarting the filebeat Daemonset solves the issue.
After upgrading to Filebeat 8.11.3 this was solved for me at least.
Filebeat 7.16.3 with following autodiscover configuration:
sporadically prints an error
Harvester could not be started on existing file
due toErr: registry already stopped
:This does not lead to data loss. @ChrsMark suggested to create a public GH issue to have someone from the team to investigate this further and see if we can improve anything most probably in the messaging/error-handling so as to avoid such confusion in the future.