gliderlabs / logspout

Log routing for Docker container logs
MIT License
4.65k stars 679 forks source link

Log are not processed after some time #309

Open matgenois opened 7 years ago

matgenois commented 7 years ago

I've subscribed to logdna.com and to get my log to there interface, they made me install logspout with this command line : sudo docker run --name="logdna" --restart=always -d -v=/var/run/docker.sock:/var/run/docker.sock -e SYSLOG_STRUCTURED_DATA="logdna@48950 tag=\"docker\"" gliderlabs/logspout syslog+tls://syslog-a.logdna.com:37555 We use Docker Cloud and AWS to host our node.js RESTful API. Our app log using Bunyan and we are logging to stdout. After some time (few hours), log stop flowing to logdna. When I restart container logspout, everything is fine again. I've seen a closed issue about dockercloud/logrotate would be the cause. logspout docker image Id is 87895b8ba614. it seems like the latest version.

Any ideas? Tell me what you need so I can help you helping me :) Thank you!

Admin EDIT: this case includes docker rotating logs and seems related to #216

michaelshobbs commented 7 years ago

What output do you see from docker logs logspout prior to restarting the container?

matgenois commented 7 years ago

Thanks for your reply! Here what I have :

{"log":"2017/06/06 03:59:09 syslog: write tcp 172.17.0.2:57198-\u003e40.114.47.198:37555: write: connection timed out\n","stream":"stderr","time":"2017-06-06T03:59:09.648844167Z"} {"log":"2017/06/06 03:59:09 syslog: retrying tcp up to 10 times\n","stream":"stderr","time":"2017-06-06T03:59:09.648873473Z"} {"log":"2017/06/06 03:59:30 syslog: retry failed\n","stream":"stderr","time":"2017-06-06T03:59:30.111101136Z"} {"log":"2017/06/06 03:59:30 syslog: reconnecting up to 10 times\n","stream":"stderr","time":"2017-06-06T03:59:30.111161867Z"} {"log":"2017/06/06 03:59:30 syslog: reconnect successful\n","stream":"stderr","time":"2017-06-06T03:59:30.407016144Z"}

michaelshobbs commented 7 years ago

That looks like logspout reconnected to the syslog server. Have you tried sending logs after you see reconnect successful?

matgenois commented 7 years ago

Yes, logs continue to flow in other containers but are not routed by logspout to our syslog server.

michaelshobbs commented 7 years ago

Currently logspout may drop some logs during the reconnect process. Just to confirm, containers are sending logs right after you see the reconnect successful message in the logspout output, correct?

matgenois commented 7 years ago

No. Containers are not sending logs right after the reconnect successful. I must restart logspout to get containers logs.

michaelshobbs commented 7 years ago

Apologies, what I meant to say there was, are logs being generated from other containers immediately after (or within say 5 minutes of) you seeing the reconnect successful output?

matgenois commented 7 years ago

Yes logs are generated from other containers during and immediately after the reconnect successful output. Our others containers are logging a lot of stuff.

michaelshobbs commented 7 years ago

I use papertrail with the new reconnect logic and it does seem to work. Perhaps logdna is doing something different?

michaelshobbs commented 7 years ago

Ah I initially didn't relate #216 to this issue. It should be clarified that this is most likely due to docker's log rotation. Need to verify though.

michaelshobbs commented 7 years ago

@matgenois can you provide your docker daemon options?

matgenois commented 7 years ago

We use the default options from docker cloud. Based on this page (https://success.docker.com/Cloud/Solve/How_do_you_set_custom_docker_daemon_options_in_the_Docker_Cloud_Agent%3F), here's my dockercloud-agent.conf : {"CertCommonName":"2dc92577-669b-4c50-80c7-f03ef4af2631.node.dockerapp.io","DockerHost":"tcp://0.0.0.0:2375","Host":"https://cloud.docker.com/","Token":"3ef633ceb03d4e379d1264256f0d6858","UUID":"2dc92577-669b-4c50-80c7-f03ef4af2631","DockerOpts":""}

michaelshobbs commented 7 years ago

Unfortunately that doesn't show me how your daemon logging is setup. Can you run ps auxwww | grep dockerd | grep -v grep and provide the output?

michaelshobbs commented 7 years ago

I was able to recreate this as outlined in #216 with --log-opt max-size=10k. Logspout just stops sending logs with no error msg.

matgenois commented 7 years ago

Sorry I don't have a good expertise on docker. Your command return nothing but what I can understand is since I run through docker cloud, dockerd is not running in the same way. Maybe this what you want?

root 879 0.0 0.2 521956 9956 ? S<sl May31 3:38 /usr/bin/dockercloud-agent root 1259 0.9 1.2 1098468 48040 ? S<l May31 93:34 /usr/bin/docker daemon -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375 --userland-proxy=false --tlscert /etc/dockercloud/agent/cert.pem --tlskey /etc/dockercloud/agent/key.pem --tlscacert /etc/dockercloud/agent/ca.pem --tlsverify

michaelshobbs commented 7 years ago

Not familiar with that setup. No worries though. I was able to reproduce the issue with docker's log rotation. Confirmed with both logdna and papertrail so it's definitely a logspout issue.

michaelshobbs commented 7 years ago

duplicate of #255

matgenois commented 7 years ago

Glad you could have reproduced it but I would have liked better that the error would be on my side ;)

If I can help more, I will be happy to!!

matgenois commented 7 years ago

I've just seen this in /var/log/dockercloud/docker.log :

time="2017-06-14T17:45:50.328534625Z" level=warning msg="container f4f5159fe5955527a6d9ebc0d75d43651cca3a2120d5a345b408b7350bb50a15 restart canceled" time="2017-06-14T18:05:47.492101354Z" level=warning msg="container f4f5159fe5955527a6d9ebc0d75d43651cca3a2120d5a345b408b7350bb50a15 restart canceled" time="2017-06-14T18:20:26.228212877Z" level=warning msg="container f4f5159fe5955527a6d9ebc0d75d43651cca3a2120d5a345b408b7350bb50a15 restart canceled"

f4f5159fe... is the container id of logspout

matgenois commented 7 years ago

@michaelshobbs Any idea on how I could detect that logspout is no longer sending log to syslog so I can make a script that will restart logspout container? I'm looking at logspout log and I have no output and in dockercloud.log either.

Do you think theres other logrotate images that would not cause this problem?

matgenois commented 7 years ago

I have found a way to restart logspout after logrotate : https://serverfault.com/questions/856251/launch-script-after-dockercloud-logrotate-rotate-log

joshontheweb commented 7 years ago

+1 I'm also experiencing this issue. I'm using the ograhl/logspout-restart container atm to restart the logspout container when it the logs are rotated. It isn't ideal, so a real fix for this bug would be welcome :)

sfrique commented 7 years ago

I think i have a similar problem, can't be sure if it is the same problem. Anyway, restarting works, i will see how to do it.

digitalLumberjack commented 7 years ago

Hi, having the same issue, and INACTIVITY_TIMEOUT=XX described in the readme https://github.com/gliderlabs/logspout#detecting-timeouts-in-docker-log-streams fixes everything.

I set it to 30 seconds : INACTIVITY_TIMEOUT=30s and even with really tiny log files, it seems to be ok :

Here is the logging description of services I want to log :

    logging:
      driver: json-file
      options:
        max-size: 1m
        max-file: 1

And the logspout-logstash service :

version: '2'
services:
  logspout:
    image: bekt/logspout-logstash:latest
    environment:
      LOGSPOUT: ignore
      ROUTE_URIS: logstash://logstash:5001
      INACTIVITY_TIMEOUT: 30s
    stdin_open: true
    external_links:
    - elk/logstash:logstash
    volumes:
    - /var/run/docker.sock:/var/run/docker.sock
    tty: true
    logging:
      driver: json-file
      options:
        max-size: 50m
        max-file: 1

The bekt image contains a logstash module and is based on the latest logspout image.

Paveltarno commented 7 years ago

+1

mtricht commented 7 years ago

Would be cool if this was fixed without the INACTIVITY_TIMEOUT environment variable, any ideas on how we could achieve this?

alecbz commented 6 years ago
  1. Does the INACTIVITY_TIMEOUT fix guarantee no dropped logs, even for the period INACTIVITY_TIMEOUT is set to?
  2. ETA on a more robust fix?
kwent commented 6 years ago

Seeing the same issue with INACTIVITY_TIMEOUT set to 30s

2018-04-23
22:49:08 # logspout v3.3-dev by gliderlabs
22:49:08 # adapters: raw syslog tcp udp tls
22:49:08 # options : persist:/mnt/routes
22:49:08 # jobs : http[routes,logs]:80 pump routes
22:49:08 # routes :
22:49:08 # ADAPTER  ADDRESS CONTAINERS  SOURCES OPTIONS
22:49:08 # syslog+tls   syslog-a.logdna.com:6514    map[]

2018-04-24
06:59:55 06:59:55 syslog: write tcp 172.17.0.2:43816->40.114.41.103:6514: write: broken pipe
06:59:55 06:59:55 syslog: reconnecting up to 10 times
06:59:55 syslog: reconnect successful
07:00:34 syslog: write tcp 172.17.0.2:53288->40.114.40.67:6514: write: broken pipe
07:00:34 syslog: reconnecting up to 10 times
07:00:34 syslog: reconnect successful

Despite this is saying reconnect successful the all process is never resuming. Restarting the all container works... Any kind of health-check would help as well.

forestjohnsonpeoplenet commented 4 years ago

https://github.com/gliderlabs/logspout/issues/377#issuecomment-537708428

Hossy commented 1 year ago

@matgenois does INACTIVITY_TIMEOUT resolve your issue? Can this be closed?