docker / for-linux

Docker Engine for Linux
https://docs.docker.com/engine/installation/
757 stars 86 forks source link

BUG: fluentd driver does not properly capture and send logs #1346

Open nhatfield opened 2 years ago

nhatfield commented 2 years ago

This is a bug from what I can tell. When sending logs within the container, to /dev/stdout and /dev/stderr it logs to the console as expected docker logs <container> . However when I switch to fluentd driver, I can visually see the settings in inspect, but it continues to log to the console!

Expected behavior

docker logs command should not work, and logs should be getting forwarded to fluentd

Actual behavior

docker logs command continues to work and logs are sent to the console and cached under /var/lib/docker/container//container-cached.log

Steps to reproduce the behavior

Create a container and mount a log volume:

docker run -d -v /opt/logs:/var/logs/app/app.log --log-driver fluentd

Setup symlink to /dev/stdout for the log file

ln -s /dev/stdout /opt/logs/app/app.log ln -s /dev/stderr /opt/logs/app/err.log

Output of docker version:

docker version
Client: Docker Engine - Community
 Version:           20.10.12
 API version:       1.41
 Go version:        go1.16.12
 Git commit:        e91ed57
 Built:             Mon Dec 13 11:45:41 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.12
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.12
  Git commit:       459d0df
  Built:            Mon Dec 13 11:44:05 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.12
  GitCommit:        7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc:
  Version:          1.0.2
  GitCommit:        v1.0.2-0-g52b36a2
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

 docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.7.1-docker)
  scan: Docker Scan (Docker Inc., v0.12.0)

Server:
 Containers: 2
  Running: 2
  Paused: 0
  Stopped: 0
 Images: 22
 Server Version: 20.10.12
 Storage Driver: devicemapper
  Pool Name: docker-253:3-805426520-pool
  Pool Blocksize: 65.54kB
  Base Device Size: 10.74GB
  Backing Filesystem: xfs
  Udev Sync Supported: true
  Data file: /dev/loop0
  Metadata file: /dev/loop1
  Data loop file: /var/lib/docker/devicemapper/devicemapper/data
  Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
  Data Space Used: 4.235GB
  Data Space Total: 107.4GB
  Data Space Available: 103.1GB
  Metadata Space Used: 21.14MB
  Metadata Space Total: 2.147GB
  Metadata Space Available: 2.126GB
  Thin Pool Minimum Free Space: 10.74GB
  Deferred Removal Enabled: true
  Deferred Deletion Enabled: true
  Deferred Deleted Device Count: 0
  Library Version: 1.02.170-RHEL7 (2020-03-24)
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc io.containerd.runc.v2 io.containerd.runtime.v1.linux
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 3.10.0-1160.53.1.el7.x86_64
 Operating System: CentOS Linux 7 (Core)
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 23.37GiB
 Name: hntblockchain-etl.blackem.net
 ID: U65O:CPIM:VAAP:AISD:CQLK:ALAO:LWMJ:YRMJ:5RWP:STW4:33QA:SFS2
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Additional environment details (AWS, VirtualBox, physical, etc.) Local environment running on CentOS 7

akerouanton commented 2 years ago

Hello @nhatfield,

docker logs command should not work, and logs should be getting forwarded to fluentd

Actually there's been a feature added to Docker v20.10.0 (see #40543) that duplicates logs when remote loggers are used. This is known as "dual logging". This duplication allows to always use docker logs, no matter what logger you set. So what you see is the expected behavior :wink: However, if you don't want this behavior, you can still disable it (see Disable the dual logging cache in the doc).

docker run -d -v /opt/logs:/var/logs/app/app.log --log-driver fluentd

This command is incomplete: there's no param set to configure the fluentd driver (eg. --log-opt fluentd-address=fluentdhost:24224) and there's no image specified. Could you provide a full command please and your daemon config if you declared fluentd params there?

Also could you answer these questions please?

nhatfield commented 2 years ago

Actually there's been a feature added to Docker v20.10.0 (see #40543) that duplicates logs when remote loggers are used.

Definitely did not know this! Thank you.

This command is incomplete: there's no param set to configure the fluentd driver

Sorry i didnt type the full command in. The image is one that you wouldnt have access to. I have many many services that are using this same fluentd config though. Here's the full command

  docker run -d --init --log-driver fluentd \
    --log-opt tag=blockchain-etl \
    --log-opt fluentd-address=logs.domain.net:5007 \
    --log-opt fluentd-async=true \
    -v /opt/etl_data:/var/data:rw \
    -v /opt/etl_data/config:/opt/etl/config:rw \
    -p 2154:2154 -p 8080:8080 \
    -p 44159:44159 --name etl \
    -e DATABASE_URL=postgresql://etl:password@blockchain-db:5432/blockchain etl:1.1.133 

Did you double-check fluentd is running and accessible?

Yes it is always running and accessible, there are many services logging through it

How did you check no logs were sent to fluentd?

Kibana!

For example:

image

I can find all my services except this one which is odd. So im not sure what is special about this one service. I've even tried recreating it several times... I'm also running a different container on the same host and it is logging fine

akerouanton commented 2 years ago

Sorry i didnt type the full command in.

Your docker run command looks normal.

I can find all my services except this one which is odd.

Little question: what do you mean by "all my services"? Are you talking about docker-compose (or Swarm) services? If that's the case, are you sure you fluentd pipeline isn't relying on a metadata set by docker-compose?

So im not sure what is special about this one service. I've even tried recreating it several times... I'm also running a different container on the same host and it is logging fine

There're too many moving parts. As you reported logs for that container where displayed by docker logs, we can exclude the issue coming from your app. However, there's still Docker, Fluentd, Elasticsearch and Kibana involved here.

Could you try running tcpdump in parallel of your buggy container? If you have too much containers on that host and this makes running tcpdump cumbersome, you can try to set up netcat to forward what it receives to fluentd and use tcpdump on the port netcat is listening to.

Once you confirm with tcpdump that Docker isn't sending logs to fluentd, could you try to remove all the unrelated flags from docker run (eg. -e, -p, -v, --name), use a public image and use a dummy shell script that write debug logs to container's stdout? With this process you could either find the cause of this issue is related to some misconfiguration/problem on your side, or obtain a "minimal bug reproducer" that anyone can run on their computer to do further debug.