Open runningman84 opened 7 years ago
Experiencing the same issue here:
$logspout --version
v3.2-dev-custom
$docker version Client: Version: 1.12.3 API version: 1.24 Go version: go1.6.3 Git commit: 6b644ec Built: Wed Oct 26 23:26:11 2016 OS/Arch: linux/amd64
Server: Version: 1.12.3 API version: 1.24 Go version: go1.6.3 Git commit: 6b644ec Built: Wed Oct 26 23:26:11 2016 OS/Arch: linux/amd64
[root@ip-XXXXXXX rancher]# docker info Containers: 56 Running: 44 Paused: 0 Stopped: 12 Images: 89 Server Version: 1.12.3 Storage Driver: overlay Backing Filesystem: extfs Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local rancher-nfs Network: null bridge overlay host Swarm: inactive Runtimes: runc Default Runtime: runc Security Options: seccomp Kernel Version: 4.4.24-rancher Operating System: RancherOS v0.7.1 OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 14.94 GiB Name: ip-XXXXXXXXXXX.ec2.internal ID: DVSV:NLW5:VD5U:UOYY:FIOB:4RTH:7NGQ:GIBO:KKTR:VPYO:J6A2:4SZE Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Insecure Registries: 127.0.0.0/8
Logspout container logs with DEBUG enabled show a constant stream of:
1/26/2017 12:37:44 PM2017/01/26 17:37:44 pump.pumpLogs(): 495069bafc93 stopped
1/26/2017 12:37:45 PM2017/01/26 17:37:45 pump.pumpLogs(): 495069bafc93 started
1/26/2017 12:37:45 PM2017/01/26 17:37:45 pump.pumpLogs(): 25635c95f804 stopped
1/26/2017 12:37:45 PM2017/01/26 17:37:45 pump.pumpLogs(): 25635c95f804 started
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): 495069bafc93 stopped
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): 495069bafc93 started
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): a27d4bfb3a41 stopped
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): a27d4bfb3a41 started
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): 32e734dcf238 stopped
1/26/2017 12:37:46 PM2017/01/26 17:37:46 pump.pumpLogs(): 32e734dcf238 started
1/26/2017 12:37:47 PM2017/01/26 17:37:47 pump.pumpLogs(): 495069bafc93 stopped
1/26/2017 12:37:47 PM2017/01/26 17:37:47 pump.pumpLogs(): 495069bafc93 started
1/26/2017 12:37:48 PM2017/01/26 17:37:48 pump.pumpLogs(): 495069bafc93 stopped
1/26/2017 12:37:48 PM2017/01/26 17:37:48 pump.pumpLogs(): a27d4bfb3a41 stopped
1/26/2017 12:37:48 PM2017/01/26 17:37:48 pump.pumpLogs(): 495069bafc93 started
1/26/2017 12:37:49 PM2017/01/26 17:37:48 pump.pumpLogs(): a27d4bfb3a41 started
@pythianarceri Are you able to provide a docker-compose.yml I can use to reproduce the issue?
We use a custom built image compiled with the logstash output module,
logspout:
image: pythiant9shared/logspout-logstash
environment:
DEBUG: '1'
ROUTE_URIS: logstash://logstash:5000
stdin_open: true
external_links:
- logstash/logstash-collector:logstash
volumes:
- /var/run/docker.sock:/var/run/docker.sock
tty: true
labels:
io.rancher.scheduler.global: 'true'
io.rancher.container.hostname_override: container_name
Any luck getting this fixed or at least the issue identified, is it normal for the pumps to be started/stopped all the time ? We are running under RancherOS 1.3.3 just in case
Same thing here. I end up seeing a lot of these messages in /var/log/docker.log
time="2017-03-13T21:36:26.491710944Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
time="2017-03-13T21:36:26.514911587Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
time="2017-03-13T21:36:26.547530212Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
It's happened a couple times. Sometimes that "invalid character" is different but the gist of the log message is the same. We see dockerd
up around 100% CPU and a constant stream of those errors.
Hi all,
We've managed to reproduce the same issue as @pythianarceri - on Jan 26. To add some detail; running a docker logs -f {ContainerID}
does not actually follow the logs and exits straight away. This would make sense with the pumpLogs
starting and stopping as the HTTP connection being closed for some reason.
This is reproducible on the latest master
commit (7c22df4e9d780959b266d7a088ff8012171edbfb) as well as 513b74c70a31c20294507f3fb21c34e52b2df48e and 8b98313954b3283d7203d1c3984b5b69ae307e0b.
Our container is running a few processed inside, all of which have PYTHONUNBUFFERED=1
.
Whilst I consider this to be a bug within our container not outputting to stdout
and stderr
correctly it also makes our docker nodes perform really badly when the containes are scheduled.
Can we add some kind of counter when the pump
starts and if it restarts {n} times within a second make it backoff?.
# docker version
Client:
Version: 17.03.0-ce
API version: 1.26
Go version: go1.7.4
Git commit: 60ccb22
Built: Mon Mar 13 18:07:51 UTC 2017
OS/Arch: linux/arm
Server:
Version: 17.03.0-ce
API version: 1.26 (minimum version 1.12)
Go version: go1.7.4
Git commit: 60ccb22
Built: Mon Mar 13 18:07:51 UTC 2017
OS/Arch: linux/arm
Experimental: false
Same issue here. Reverted back to docker image v3.1 and logging and CPU usage is back to normal.
Same issue, we also reverted to docker image v3.1.
We are running Docker version 17.04.0-ce, build 4845c56, and deploying our logspout container with the new swarm mode ('docker stack deploy'). We're not using logstash, but forward logs to a remote syslog directly.
Here is the content of our docker-compose.yml file:
version: '3'
services:
logspout:
image: gliderlabs/logspout
environment:
SYSLOG_STRUCTURED_DATA: "- $SYSLOG_CONF"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
ports:
- 10888:80
command: syslog+tls://$LOGS_ENDPOINT
deploy:
mode: global
I have the same problem, using the most recent master image (ef55cfcceda7
) and version v3.3-dev
on both docker 17.04.0-ce
and 1.13.1
.
It's fairly difficult to reproduce, and seems to occur only after a period of time - I'm not sure whether its a result of complex log messages (or potentially unexpected charsets), or just the volume of messages shipped off.
During the time that it's at high cpu usage, I also have seen a number of weird messages such as what mattdodge described
@woutor @sebgl after you reverted back to version 3.1 - did you see the problem pop up again?
@gaieges: Nope, with 3.1 the CPU usage remains normal.
Same issue there with Docker 17.05.0-ce and logspout 3.3. We can't go back to 3.1 cause we need label filtering feature.
Same issue here. Reverted back to docker image v3.1 and logging and CPU usage is back to normal.
I'm using v3.3 in production with papertrail. Is there anybody experiencing this issue that is NOT using logstash? I'm thinking the problem might be in that module...
@michaelshobbs I don't use the logstash module, I just use the tcp+syslog output.
Interesting....anybody have any thoughts on this one?
Same issue here, but i cannot ack the high cpu load. Only a lot loglines in /var/log/syslog and the pump.pumpLogs() when debugging enabled.
i also using the logstash adapter
.....
165+02:00" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"
Jun 16 11:55:18 xyz docker[11404]: time="2017-06-16T11:55:18.307515574+02:00" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"
Jun 16 11:55:18 xyz docker[11404]: time="2017-06-16T11:55:18.432245150+02:00" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"
....
We encountered the same issue today randomly on one of hosts. I unfortunately do not have anymore info than what others have reported. Docker 17.03.1-ce RancherOS v1.0.2 logspout configured to send to syslog+tcp
Hello folks,
I was having the same problem:
Jun 28 14:07:20 xxx docker[7686]: time="2017-06-28T14:07:20.892703572Z" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"
Jun 28 14:07:20 xxx docker[7686]: time="2017-06-28T14:07:20.894972987Z" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"
and it turns out to be corrupt json-file.log
, it happened when I got a disk full so json-file.log
was truncated.
I had some thing like this at my json-file.log
after disk space got fixed and docker running again.
{"log":" 2017/06/28 07:46:00 [INFO] consul: adding WAN ser{"log":" 2017/06/28 07:46:00 [INFO] consul: adding WAN server xxx.dc1 (Addr: 10.240.0.21:8300) (DC: dc1)\n","stream":"stdout","time":"2017-06-28T07:46:00.5738544Z"}
So it isn't a valid json and it breaks docker logs
, somehow it make logspout enter a loop and make docker print those lines.
If you try to run, docker logs $container
on the container with that corrupt json-file.log
, you will get one line of the error.
So look for corrupted/bad formatted json-files.log
I just removed that line and everything was working normally again.
Hope that helps.
On @sfrique 's note - I wonder if using journald log driver (if possible) instead of json-file would help this?
@gaieges I tried with journald, same problem
We're seeing the same thing - 100% CPU usage when running logspout (pumping to logstash). Enabling logspout's debug logs showed rapid repetition of log entries such as:
21/07/2017 11:57:202017/07/20 23:57:20 pump.pumpLogs(): fb6033455fd0 stopped
21/07/2017 11:57:202017/07/20 23:57:20 pump.pumpLogs(): fb6033455fd0 started
Initially, we were seeing this on most of our hosts, with two container IDs showing up on each. The containers in question were Rancher's metadata and DNS containers. We upgraded Rancher from 1.6.2 to 1.6.5 and upgraded all the system services, and logspout now seems to run happily on most of our hosts. However, we're still getting the stopped/started log spam (and corresponding CPU load) on our DB hosts - with only a single container ID flopping: mongodb.
The scenario described by @pauln is exactly what we are experiencing, (we haven't had chance to upgrade to 1.6.5 yet), Out of curiosity @pauln , what docker version are you running on your hosts ?
@garceri, we're currently running Docker 17.03 CE, though we're looking at upgrading to 17.06 CE in the near future.
My thoughts on this issue:
DEBUG: true
shows just the constant stream of pump start/stop2017/07/26 07:09:48 pump.pumpLogs(): 21c90bb7ac5c stopped
2017/07/26 07:09:48 pump.pumpLogs(): 21c90bb7ac5c started
2017/07/26 07:09:48 pump.pumpLogs(): 21c90bb7ac5c stopped
2017/07/26 07:09:48 pump.pumpLogs(): 21c90bb7ac5c started
@robcza thanks for the additional details. can you provide a full debug log from start to stop of the logspout container that has entered into this state? looking for any other clues here.
To your question about difference between versions, we tag each release and thus you can browse the history by comparing tags. Additionally, we have a CHANGELOG.md
with convenient links to compare each version.
Same for me on docker 17.03.1-ce
using this image bekt/logspout-logstash:latest
.
I have managed to get a full debug log since logspout container creation.
There is a constant start / stop stream for c21bde2e91e7
.
I had this issue on other machines as well with various services.
# docker logs elk-logspout
2017/07/28 10:50:57 setting allowTTY to: false
# logspout v3.3-dev-custom by gliderlabs
# adapters: raw udp tcp logstash
# options : debug:True persist:/mnt/routes
# jobs : http[]:80 pump routes
# routes :
# ADAPTER ADDRESS CONTAINERS SOURCES OPTIONS
# logstash+tcpprod00:1111 map[]
2017/07/28 10:50:57 pump.Run(): using inactivity timeout: 0s
2017/07/28 10:50:57 pump.pumpLogs(): 4dd58c94144a ignored: environ ignore
2017/07/28 10:50:57 pump.pumpLogs(): 741c807703d4 started
2017/07/28 10:50:57 pump.pumpLogs(): 03c007defa61 started
2017/07/28 10:50:57 pump.pumpLogs(): 0398b9b5542e started
2017/07/28 10:50:57 pump.pumpLogs(): 0b1b04fd17a9 started
2017/07/28 10:50:57 pump.pumpLogs(): 18b4f33a6b1d started
2017/07/28 10:50:57 pump.pumpLogs(): 18cd5d3b88cb started
2017/07/28 10:50:57 pump.pumpLogs(): dfebc3118ce3 started
2017/07/28 10:50:57 pump.pumpLogs(): e12708a548c4 started
2017/07/28 10:50:57 pump.pumpLogs(): 78423dfc9805 started
2017/07/28 10:50:57 pump.pumpLogs(): 872c498a0c2f started
2017/07/28 10:50:57 pump.pumpLogs(): 2f78048ee356 started
2017/07/28 10:50:57 pump.pumpLogs(): bcedcc0cd5d2 started
2017/07/28 10:50:57 pump.pumpLogs(): 0b8e0146d61c started
2017/07/28 10:50:57 pump.pumpLogs(): a1c7966d4ad3 started
2017/07/28 10:50:57 pump.pumpLogs(): d49788286a32 started
2017/07/28 10:50:57 pump.pumpLogs(): 41c5e05d0019 started
2017/07/28 10:50:57 pump.pumpLogs(): ce7030af1846 started
2017/07/28 10:50:57 pump.pumpLogs(): 78cff52b2cc8 started
2017/07/28 10:50:57 pump.pumpLogs(): c5b69f63a3e8 started
2017/07/28 10:50:57 pump.pumpLogs(): 52ab8eb7e542 started
2017/07/28 10:50:57 pump.pumpLogs(): f26b009798f4 started
2017/07/28 10:50:57 pump.pumpLogs(): 7b1f7d90859a started
2017/07/28 10:50:57 pump.pumpLogs(): a5b864e479b7 started
2017/07/28 10:50:57 pump.pumpLogs(): 08581c7c3829 started
2017/07/28 10:50:57 pump.pumpLogs(): 9e4497a0dcaf started
2017/07/28 10:50:57 pump.pumpLogs(): 227ca66d7f83 started
2017/07/28 10:50:57 pump.pumpLogs(): 6f6b327b7b9e started
2017/07/28 10:50:57 pump.pumpLogs(): 846ba10f0ee7 started
2017/07/28 10:50:57 pump.pumpLogs(): e672bc2d0fe2 started
2017/07/28 10:50:57 pump.pumpLogs(): f1d48ee0211f started
2017/07/28 10:50:57 pump.pumpLogs(): f8fdca2aff57 started
2017/07/28 10:50:57 pump.pumpLogs(): 5856ed01b7ff started
2017/07/28 10:50:57 pump.pumpLogs(): 57e77510b9de started
2017/07/28 10:50:57 pump.pumpLogs(): a05333be7c0f started
2017/07/28 10:50:57 pump.pumpLogs(): dcab95f7ca2f started
2017/07/28 10:50:57 pump.pumpLogs(): f28130fa353a started
2017/07/28 10:50:57 pump.pumpLogs(): 464d4d0b24bd started
2017/07/28 10:50:57 pump.pumpLogs(): 139fccfc578f started
2017/07/28 10:50:57 pump.pumpLogs(): 341f311f43c0 started
2017/07/28 10:50:57 pump.pumpLogs(): f08a73de23ab started
2017/07/28 10:50:57 pump.pumpLogs(): 7e88c606f51b started
2017/07/28 10:50:57 pump.pumpLogs(): c21bde2e91e7 started
2017/07/28 10:50:57 pump.pumpLogs(): 52fe6826666a started
2017/07/28 10:51:02 pump.pumpLogs(): c21bde2e91e7 stopped
2017/07/28 10:51:02 pump.pumpLogs(): c21bde2e91e7 started
2017/07/28 10:51:06 pump.pumpLogs(): c21bde2e91e7 stopped
2017/07/28 10:51:06 pump.pumpLogs(): c21bde2e91e7 started
2017/07/28 10:51:09 pump.pumpLogs(): c21bde2e91e7 stopped
2017/07/28 10:51:09 pump.pumpLogs(): c21bde2e91e7 started
2017/07/28 10:51:11 pump.pumpLogs(): c21bde2e91e7 stopped
2017/07/28 10:51:11 pump.pumpLogs(): c21bde2e91e7 started
2017/07/28 10:51:14 pump.pumpLogs(): c21bde2e91e7 stopped
2017/07/28 10:51:14 pump.pumpLogs(): c21bde2e91e7 started
Thank @ko-christ for more detail! Next time this happens it would be great to see if there's anything odd about the output of the container with the "thrashing" log pump. (i.e. docker logs c21bde2e91e7
in this particular case)
Same request to others that have noticed the stop/start thrash symptom.
Hi all,
We've just encountered this issue recently. (logspout starts and CPU jumps to ~100%)
Here are my insights after investigating this for a couple of hours:
In our environment, this was caused due to malformed JSON output in the container logs (when using json-file logging).
What apparently happened is that the machine ran out of space (we had a clear indication for that) and log lines could not be written properly - so we got json log line that were cut in the middle.
When that happens, docker will print out the following whenever someone tries to fetch the logs (even a simple docker logs
command will reproduce this)
"Error streaming logs: invalid character 'l' after object key:value pair"
Apparently, docker will print this error and stop returning the log lines, even if there were many more lines after the corrupt json line. On our machine we had about a dozen services constantly logging and this JSON line corruption was found in most of them. It seems that docker encounters this and emits an error( https://github.com/moby/moby/blob/8af4db6f002ac907b6ef8610b237879dfcaa5b7a/daemon/logs.go#L109 ) yet Logspout immediately starts requesting logs again and, again, docker bombards the docker service logs with the above error message (tens per second).
Possible workarounds for such an issue (I didn't verify these options yet):
So this actually seems to be a docker issue. The best way to deal with this, IMHO, is to have docker detect such cases and roll the log file and put the corrupt log file aside (while prompting a meaningful warning about this)
One thing that puzzles me is why reverting to v3.1 "fixes" this issue (I didn't try reverting back to v3.1 yet, I'm using v3.3).
Hello, I am having the same issue. Docker log is flooded with messages like this:
time="2017-08-30T22:30:31.922624624Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
time="2017-08-30T22:30:31.977376555Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
time="2017-08-30T22:30:32.172320949Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
time="2017-08-30T22:30:32.570219033Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
time="2017-08-30T22:30:32.662642497Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
time="2017-08-30T22:30:32.860254549Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
And the docker logs for bekt/logspout-logstash:master
are:
2017/08/30 22:18:33 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:33 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:34 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:34 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:34 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:34 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:34 pump.Run() event: 42ad09d40141 network:connect
2017/08/30 22:18:34 pump.Run() event: orient-backu volume:mount
2017/08/30 22:18:34 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:35 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:35 pump.Run() event: orient-datab volume:mount
2017/08/30 22:18:35 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:35 pump.Run() event: orient-log volume:mount
2017/08/30 22:18:35 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:35 pump.Run() event: rancher-cni volume:mount
2017/08/30 22:18:35 pump.Run() event: efe51a4c5ed0 volume:mount
2017/08/30 22:18:36 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:36 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:36 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:36 pump.Run() event: 58a80e107fc1 start
2017/08/30 22:18:36 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:36 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:36 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:36 pump.pumpLogs(): 58a80e107fc1 started
2017/08/30 22:18:36 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:36 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:36 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:36 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:37 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:37 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:37 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:37 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:37 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:37 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:37 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:37 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:38 pump.pumpLogs(): be68d546188d stopped
2017/08/30 22:18:38 pump.pumpLogs(): be68d546188d started
2017/08/30 22:18:38 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:38 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:38 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:38 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:38 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:38 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:39 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:39 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:39 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:39 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:39 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:39 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:39 pump.pumpLogs(): f6fddb18f835 started
2017/08/30 22:18:39 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:40 pump.pumpLogs(): 9ab5e8cc0c36 stopped
2017/08/30 22:18:40 pump.pumpLogs(): 9972da45bf76 stopped
2017/08/30 22:18:40 pump.pumpLogs(): 9972da45bf76 started
2017/08/30 22:18:40 pump.pumpLogs(): 9ab5e8cc0c36 started
2017/08/30 22:18:40 pump.pumpLogs(): f6fddb18f835 stopped
2017/08/30 22:18:40 pump.pumpLogs(): f6fddb18f835 started`
I have trimmed some of the docker-json logs and restarted the containers and it is not helping, once the logstput container starts it makes dockerd to eat almost all the CPU.
Thanks in advance!
@mattatcha thoughts on this given new info?
@pythiancastaneda To figureout what container is giving you the trouble:
# tail -f /var/log/syslog | grep docker
Second term:
# docker logs "for each container you have"
That way you will see at logs:
time="2017-08-30T22:30:32.860254549Z" level=error msg="Error streaming logs: invalid character '\\x00' looking for beginning of value"
And then you can remove the log or the wrong line.
Hope that helps.
I recently had the same problem and i did the steps above and fixed the issue.
Thanks for the update @stevecalvert ! In my case issue was the same corrupt container logs. For me, 3.2.2 also was logging an additional error. Posting that just in case
logspout_1 | 2017/09/15 17:42:43 pump.pumpLogs(): 0e1c957edf2d started
logspout_1 | 2017/09/15 17:42:43 pump.pumpLogs(): 0e1c957edf2d stopped with error: Unrecognized input header: 3
logspout_1 | 2017/09/15 17:42:43 pump.pumpLogs(): 0e1c957edf2d started
logspout_1 | 2017/09/15 17:42:43 pump.pumpLogs(): 0e1c957edf2d stopped with error: Unrecognized input header: 3
Only way for me to fix that issue, like for others, was to remove and recreate container.
@leriel I get the same issue with logspout 3.2.2 and gelf plugins.
But i'm sure to understand the workarround ...
You recreate which container ? Logspout or containers with corrupted json-file logs ?
Is there any way to correct logspout code ?
Don't know if thee is any relationship ... but I get this message if I tried to tail on a *-json.log, given by logspout in debug mode
tail: inotify cannot be used, reverting to polling: Too many open files
seem to by a default limit on ubuntu ? that could explain a some corrupted json-file logs ?
see https://nefaria.com/2014/08/tail-inotify-resources-exhausted/
Another info ... when I purge all json-file logs given "stopped" by logspout, and restart it, then cpu usage on dockerd is flat
here my code:
for containerid in $(docker logs $(docker ps -a -q --filter "label=io.rancher.stack_service.name=graylog2/logspout") 2>&1| grep stopped | cut -d " " -f 4 | sort -u); do
ls -l /var/lib/docker/containers/${containerid}*/${containerid}*-json.log
> /var/lib/docker/containers/${containerid}*/${containerid}*-json.log
done
One more information, I use docker engine 17.03.2-ce. Last week, we had a power shutdown on the nodes. In my case, I think that it was the cause, why we get corrupted json-file logs (Error streaming logs: invalid character 'l' after object key:value pair) Do you know if these issue can be solved with a new docker-ce release ?
Here some news ... But don't know if there is a relationship
I get all containerid given stopped by logspout:
docker logs $(docker ps -a -q --filter "label=io.rancher.stack_service.name=graylog/logspout") 2>&1| grep stopped | cut -d " " -f 4 | sort -u
c062de60016b
c4d6b098968c
f8f18bba6203
I effectivelly found '\x00' of each file that is corrupted, and files are big
for containerid in $(docker ps -a -q); do
grep -Pa '\x00' /var/lib/docker/containers/${containerid}*/${containerid}*-json.log
if [ $? -eq 0 ]; then
echo ${containerid}
ls -lh /var/lib/docker/containers/${containerid}*/${containerid}*-json.log
fi
done
{"log":"Creating volume folders...\n","stream":"stdout","time":"2017-09-16T15:50:34.379859731Z"}
c4d6b098968c
-rw-r----- 1 root root 564M Sep 21 12:11 /var/lib/docker/containers/c4d6b098968c475a82ed7ff34cc4e50344f097155f10d0d9d07528437b4417e7/c4d6b098968c475a82ed7ff34cc4e50344f097155f10d0d9d07528437b4417e7-json.log
{"log":"Creating volume folders...\n","stream":"stdout","time":"2017-09-16T15:50:33.076131673Z"}
c062de60016b
-rw-r----- 1 root root 478M Sep 21 12:11 /var/lib/docker/containers/c062de60016bd13faeebceb0301c121b35a0dfbc620048584a3d06c3d0449761/c062de60016bd13faeebceb0301c121b35a0dfbc620048584a3d06c3d0449761-json.log
{"log":"Creating volume folders...\n","stream":"stdout","time":"2017-09-16T15:50:47.996912867Z"}
f8f18bba6203
-rw-r----- 1 root root 806M Sep 21 12:11 /var/lib/docker/containers/f8f18bba62032e188dd75f76b9cc061936e804995289d7406b89d43e02d7cc36/f8f18bba62032e188dd75f76b9cc061936e804995289d7406b89d43e02d7cc36-json.log
We experienced the same issue yesterday. We don't use logspout, but it is the closest issue we found, with this kind of error logs:
Jun 28 14:07:20 xxx docker[7686]: time="2017-06-28T14:07:20.892703572Z" level=error msg="Error streaming logs: invalid character 'l' after object key:value pair"
And the invalid char being anything in our case.
We are still investigating, but would like to see if we can find the issue together.
It looks like with our current information that one of our containers is outputting some kind of json under specific conditions, and that dockerd doesn't like it at all. Didn't find any moby
related issue though :/
We'll let you know of our findings.
Our setup is different though, docker 1.12 + k8s 1.7.5 with scalyr agent ingesting logs from dockerd.
Hello I have:
$ docker version
Client:
Version: 17.09.0-ce
API version: 1.32
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:42:09 2017
OS/Arch: linux/amd64
Server:
Version: 17.09.0-ce
API version: 1.32 (minimum version 1.12)
Go version: go1.8.3
Git commit: afdb6d4
Built: Tue Sep 26 22:40:48 2017
OS/Arch: linux/amd64
Experimental: false
System: Debian 9.1
logspout setup in Rancher:
and in syslog dockerd errors:
Oct 24 10:03:39 web4-vultr dockerd[31377]: time="2017-10-24T10:03:39.819294736+01:00" level=error msg="Error streaming logs: invalid character '\x00' looking for beginning of value" container=db62afc089ec method="(*Daemon).ContainerLogs" module=daemon
Oct 24 10:03:41 web4-vultr dockerd[31377]: time="2017-10-24T10:03:41.950449384+01:00" level=error msg="Error streaming logs: invalid character '\x00' looking for beginning of value" container=db62afc089ec method="(*Daemon).ContainerLogs" module=daemon
Oct 24 10:03:42 web4-vultr dockerd[31377]: time="2017-10-24T10:03:42.717452402+01:00" level=error msg="Error streaming logs: invalid character '\x00' looking for beginning of value" container=6488521fdc8f method="(*Daemon).ContainerLogs" module=daemon
and CPU usage is very high
I dont know why, but downgrade logspout to version v3.1 fix it..
We're on AWS and see this issue regularly. We have ~100 EC2 instances and when AWS suddenly retires or shuts down an instance the server restarts but some of the docker json log files end up corrupt.
I can see where the corruption is and what line by doing the following:
for FILE in /var/lib/docker/containers/*/*-json.log; do cat $FILE | jq . > /dev/null; done
I used to just shutdown the container that corresponded to that corrupt logs(s), manually remove the corrupted (half written json line), then restart the container and all was well. Though with newer version of docker that no-longer fixes the issues I have to remove all the containers and re-deploy them to fix it now. I think this is really a Docker issue. If Docker detects a half written file I think it needs to handle this correctly. Basically 100% of the time AWS restarts a node unexpectedly this happens for us. Typically it's faster to completely replace the node than spend a bunch of time trying to fix this by hand.
We have the same issue but found the offending log files using grep below.
grep -P '\x00' /var/lib/docker/containers/**/*json.log
After manually removing the null "\x00" characters from the log the issue went away. It seems to be a docker issue docker/for-linux#140 related to unexpected host/daemon restarts.
me too, same issue ... any suggestion ?
We resolved to use #321 as a workaround for the time being. (note that BACKLOG should not be set, or be set to true).
I've faced with 100% CPU usage and error:
Mar 23 16:48:52 burger dockerd[569]: time="2018-03-23T16:48:52.971499270+02:00" level=error msg="Error streaming logs: EOF" container=b1dc4286ba72 method="(*Daemon).ContainerLogs" module=daemon
on cadvisor
container.
My config:
version: '2.3'
services:
logspout:
image: gliderlabs/logspout:v3.2.4
container_name: logspout
command: syslog+tcp://logs-01.loggly.com:514
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock
environment:
TAIL: 10
Any update on this? Just had it happen on a server which became pretty unresponsive because of it
+upvoting rollback to 3.1
I've just force pushed ca675f3030a87f41536415114800470c0c74cd0d and manually added:
- glide.lock
- glide.yaml
- Dockerfile
- build.sh
- and also /adapters/multiline/multiline.go to use multiline
from latest master to build an image new container works good
I just wanted to confirm about this issue with the latest CE version of Docker (19.03.12) and logspout (syslog+tls). We've had production servers crashing because of this. Dockerd was using 1600% CPU and then became unresponsive on a 32 CPU server. Noticed a very high number of goroutines
showing with docker info
in debug mode. Stopping the logspout container and waiting a few hours resolved the issue and the Dockerd CPU was back to normal (along with the number of goroutines). Tried a workarounds like TAIL=0 without success. Reverting to v3.1 solves the issue for now.
We had about 30 containers running on that machine, logging was heavy but not excessive. It's easily reproducible with the latest logspout version: starting logspout with a lot of containers logging show dockerd using high CPU almost immediately and increasing over time.
Maybe v3.1 is querying the docker daemon differently than newer versions and flooding it with requests leading to a high number of goroutines and high CPU?
If I start logspout-logstash using this docker-compose the cpu usage spikes to 100% and stays there...
before starting logspout
after starting logspout