amir20 / dozzle

Realtime log viewer for docker containers.
https://dozzle.dev/
MIT License
5.7k stars 287 forks source link

Traefik v3: Certain logs do not show up when compression middleware is enabled #3015

Closed ManiMatter closed 3 months ago

ManiMatter commented 3 months ago

Describe the bug Dozzle does not display the logs for ghcr.io/gethomepage/homepage:latest. No logs are shown in dozzle.

sudo docker logs homepage does show

Listening on port 3000

Note: The same problem also occurs for the container "glances". nicolargo/glances:latest-full

No output on dozzle, but console output:

sudo docker logs glances

INFO: Started server process [1] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:61208 (Press CTRL+C to quit)

To Reproduce Install homepage, run it, and check logs.

Expected behavior See the mentioned log in dozzle.

Desktop (please complete the following information):

OS:

Client: Docker Engine - Community Version: 26.1.3 API version: 1.45 Go version: go1.21.10 Git commit: b72abbb Built: Thu May 16 08:33:29 2024 OS/Arch: linux/amd64 Context: default

Server: Docker Engine - Community Engine: Version: 26.1.3 API version: 1.45 (minimum version 1.24) Go version: go1.21.10 Git commit: 8e96db1 Built: Thu May 16 08:33:29 2024 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.32 GitCommit: 8b3b7ca2e5ce38e8f31a34f35b2b68ceb8470d89 runc: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0

Version: 7.0.4.

amir20 commented 3 months ago

I did

❯ docker run ghcr.io/gethomepage/homepage:latest                                                                                                                8:19:37
Unable to find image 'ghcr.io/gethomepage/homepage:latest' locally
latest: Pulling from gethomepage/homepage
94747bd81234: Already exists
2750124feef8: Pull complete
52f6e1aafda6: Pull complete
5a5fe732514f: Pull complete
8c37a8585a94: Pull complete
e28d09e6ad37: Pull complete
254e58021b13: Pull complete
461e14a25727: Pull complete
56a6e022e2d4: Pull complete
c8c64a2e2c75: Pull complete
f34de3f0f790: Pull complete
Digest: sha256:5356c97b51e3cc817bed93612b4e57b39d28048ab9e4e3b346e827160cf0923e
Status: Downloaded newer image for ghcr.io/gethomepage/homepage:latest
Listening on port 3000

Seems to work here. 6a834df6e4512251ec5b6d541aaf187c

I don't expect glances to work since it is dynamic.

I am not sure why it's not working for you. Couple of things:

  1. Does Dozzle itself have any error in logs?
  2. Is there any proxy issues?
  3. Does it work for any other containers?

I won't have much time to investigate so you'll have to figure out what's different.

amir20 commented 3 months ago

Worth reading https://gethomepage.dev/latest/configs/settings/#log-path. You might have written to a file. You need to write to stdout which should be default.

ManiMatter commented 3 months ago

Hey @amir20,

many thanks for your quick reply.

To answer your questions:

1. Does Dozzle itself have any error in logs?

No, dozzle does not show any logs (in dozzle). I thought that this might be expected since the logs get produced before dozzle is fully up and running. sudo docker logs dozzle shows:

time="2024-06-04T10:27:04Z" level=info msg="Dozzle version v7.0.4" time="2024-06-04T10:27:04Z" level=info msg="Connected to 1 Docker Engine(s)" time="2024-06-04T10:27:04Z" level=info msg="Accepting connections on :8080" time="2024-06-06T14:42:35Z" level=info msg="stopped collecting container stats"

My docker-compose for dozzle looks like this:

 dozzle:
   <<: *common-keys-core 
   image: amir20/dozzle:latest
   container_name: dozzle
   environment:
     DOZZLE_LEVEL: info
     DOZZLE_FILTER: "status=running"
   volumes:
     - /var/run/docker.sock:/var/run/docker.sock 
   labels:
     - "traefik.enable=true"
     - "traefik.http.routers.dozzle-rtr.entrypoints=https"
     - "traefik.http.routers.dozzle-rtr.rule=Host(`dozzle.$DOMAINNAME_CLOUD_SERVER`)"
     - "traefik.http.routers.dozzle-rtr.tls=true"
     - "traefik.http.routers.dozzle-rtr.middlewares=chain-oauth@file"
     - "traefik.http.routers.dozzle-rtr.service=dozzle-svc"
     - "traefik.http.services.dozzle-svc.loadbalancer.server.port=8080"

Homepage looks like this:

 homepage:
   <<: *common-keys-core
   image: ghcr.io/gethomepage/homepage:latest
   container_name: homepage
   volumes:
     - $DOCKERDIR/appdata/homepage:/app/config
     - $DOCKERDIR/appdata/homepage/images:/app/public/images
     - $DOCKERDIR/appdata/homepage/images:/app/public/icons
     - /storage/internal:/internal
     - /home:/home
   environment:
     <<: *default-tz-puid-pgid
   labels:
     - "traefik.enable=true"
     - "traefik.http.routers.homepage-rtr.entrypoints=https"
     - "traefik.http.routers.homepage-rtr.rule=Host(`homepage.$DOMAINNAME_CLOUD_SERVER`) || Host(`$DOMAINNAME_CLOUD_SERVER`) || Host(`www.$DOMAINNAME_CLOUD_SERVER`)"
     - "traefik.http.routers.homepage-rtr.middlewares=chain-oauth@file"
     - "traefik.http.routers.homepage-rtr.service=homepage-svc"
     - "traefik.http.services.homepage-svc.loadbalancer.server.port=3000"

2. Is there any proxy issues?

I'm not sure I understand the question, would you mind to elaborate?

3. Does it work for any other containers?

Yes, it works for 24 out of 27 containers. Only for dozzle, homepage and glances it does not seem to work. As example, "flaresolverr" (one of the 24 containers that work) shows logs, and the respective docker-compose looks like this:

 flaresolverr:
   <<: *common-keys-apps
   image: ghcr.io/flaresolverr/flaresolverr:latest
   container_name: flaresolverr
   environment:
     <<: *default-tz-puid-pgid
   labels:
     - "traefik.enable=true"
     - "traefik.http.routers.flaresolverr-rtr.entrypoints=https"
     - "traefik.http.routers.flaresolverr-rtr.rule=Host(`flaresolverr.$DOMAINNAME_CLOUD_SERVER`)"
     - "traefik.http.routers.flaresolverr-rtr.tls=true"
     - "traefik.http.routers.flaresolverr-rtr.middlewares=chain-oauth@file"
     - "traefik.http.routers.flaresolverr-rtr.service=flaresolverr-svc"
     - "traefik.http.services.flaresolverr-svc.loadbalancer.server.port=8191" 

Not really sure why it works for most, but not for glances, homepage, and dozzle. Do you spot what's going on here?

You mentioned in your post above that you don't expect it to work for glances "since it's dynamic". What do you mean by this?

amir20 commented 3 months ago

Something is not right with Dozzle for you. Dozzle should show its own logs. In fact, I test it all the time with that.

My docker-compose for dozzle looks like this:

I don't see anything wrong here.

I'm not sure I understand the question, would you mind to elaborate?

Are you running some kind of proxy in front of Dozzle? It seems like you are running traefik. I too use traefik for most of deployments so doubt it can be that. Some times proxies don't flush http headers correctly so Dozzle looks broken.

Yes, it works for 24 out of 27 containers. Only for dozzle, homepage and glances it does not seem to work. As example, "flaresolverr" (one of the 24 containers that work) shows logs, and the respective docker-compose looks like this:

Everything looks right there too.

You mentioned in your post above that you don't expect it to work for glances "since it's dynamic". What do you mean by this?

glances can be in terminal mode which requires -it. In this mode. Docker creates an interactive session which is how the UI keeps getting updated. This mode won't work in Dozzle. However, if you run as web UI, you should see the logs. I tested it and it does work.

I am a little out of ideas. Some ideas:

  1. What browser are you using?
  2. Are there any browser errors?
  3. For one container that doesn't work, try going to https://[dozzle-host]/api/hosts/localhost/containers/[container-id]/logs/stream?stdout=1&stderr=1 replacing [container-id] and [dozzle-host] accordingly. You should see a stream of logs in your browser. If that doesn't work, then it's not a browser issue.
  4. Try running Dozzle directly without traefik. If that works, then something is going on with Traefik. Make sure you have the latest version of traefik. This might actually be in the issue because these containers that don't work have very little logs so they might not be flushing.

I think your best bet is to get Dozzle working without any proxies or maybe even locally and then start introducing traefik and/or other configurations.

ManiMatter commented 3 months ago

Many thanks for the follow-up. On your questions

1. What browser are you using?

Google Chrome, Version 125.0.6422.114 (Official Build) (x86_64)

2. Are there any browser errors?

The network tab shows these errors when loading dozzle (irrespective of which container is being selected); Not sure what it means or if it's related. Will investigate.

Error with Permissions-Policy header: Unrecognized feature: 'vr'. 0022b48ca6be:39 Refused to execute inline script because it violates the following Content Security Policy directive: "default-src 'self'". Either the 'unsafe-inline' keyword, a hash ('sha256-xVm9XzkXAV+5WTtvarZKf3KXkDXL89Qfs5Kb9oNPhqA='), or a nonce ('nonce-...') is required to enable inline execution. Note also that 'script-src' was not explicitly set, so 'default-src' is used as a fallback.

3. Check stream

I tried this link for a container that works, and I saw the logs as plain text. The corresponding link for the dozzle container gets a 524 error response (timeout).

https://dozzle.xxxx.com/api/hosts/localhost/containers/0022b48ca6be/logs/stream?stdout=1&stderr=1

3. Latest version of traefik

Currently, I am using traefik 3.0.1, which is the latest stable. I'll investigate your tip to try run dozzle without traefik and report back.

amir20 commented 3 months ago

I have a feeling 524 means something is going on with your proxy. Let me know what happens when you try Dozzle directly. I'll try to update Traefik on my side when I get a chance and see if it happens to me.

ManiMatter commented 3 months ago

I'll try Dozzle directly later today and report back.

I wanted to ask you in the meantime: I don't understand what the proxy (my case traefik) has to do with dozzle. Sure, I open dozzle via a URL (traefik.mydomain.com) and traefik comes into play here that I can open the website, which works.

However, in my understanding for fetching the logs dozzle connects to the docker socket:

   volumes:
     - /var/run/docker.sock:/var/run/docker.sock 

Thus I would have thought dozzle should see get the same result as running sudo docker logs XXX in the terminal.

How does the proxy matter for fetching the logs? Many thanks for your explanation.

ManiMatter commented 3 months ago

mh. Here's an interesting observation.

I connected to dozzle via my home network (e.g. 192.168.1.6:7070), and voilà: I can see the logs for homepage, dozzle, and glances. Beautiful.

No changes, I connect to the very same dozzle via the external URL routed through traefik (e.g. dozzle.mydomain.com). The logs are gone.

I am not sure I understand why the proxy matters for dozzle to read out the logs. Given your earlier question about proxy, I feel you have a hunch what's going on?

amir20 commented 3 months ago

The way a proxy works is by intercepting everything between the client and Dozzle. It can buffer, transform and change the output. In this case, it sounds like Traefik is not flushing the stream correctly. Dozzle uses server side events (SSE) which require to flushing in real time. Further explanation is beyond the scope of this issue.

I suspect this has been broken in v3 with Traefik. It broke temporary in v2 but they reverted it.

I am going to reopen the issue and try testing with v3. If it is something I can do on my side to fix it, then I will, but sounds like Traefik introduced a bug.

Temporary solution could be to use v2.

amir20 commented 3 months ago

I tried locally and it seems to work.

services:
  traefik:
    image: traefik:v3.0.1
    command:
      - --api.insecure=true
      - --api.dashboard=true
      - --providers.swarm.exposedByDefault=false
      - --providers.swarm.endpoint=unix:///var/run/docker.sock
      - --providers.docker.network=web
      - --entrypoints.web.address=:8000
    ports:
      - "8000:8000"
      - "8080:8080"
    networks:
      - web
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
  dozzle:
    image: amir20/dozzle:latest
    networks:
      - web
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    depends_on:
      - traefik
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.dozzle.rule=Host(`dozzle.localhost`)"
      - "traefik.http.services.dozzle.loadbalancer.server.port=8080"

networks:
  web:
    external: true

I haven't tried with HTTPS since it's hard to set that up locally.

@ManiMatter any ideas what's different or how I can reproduce this? I was almost certain I'd reproduce with v3.0.1. Based on what you said https://github.com/amir20/dozzle/issues/3015#issuecomment-2154722714 it is definitely related to something not related to Dozzle.

ManiMatter commented 3 months ago

I am comparing your setup with mine and am trying to decipher which setting may be causing it. While I am doing that, let me share my traefik config with you:

traefik:
   <<: *common-keys-core
   container_name: traefik
   image: traefik:latest
   command:
     - --global.checkNewVersion=true
     - --global.sendAnonymousUsage=true
     - --entryPoints.http.address=:80
     - --entryPoints.https.address=:443
     - --entrypoints.https.forwardedHeaders.trustedIPs=$CLOUDFLARE_IPS,$LOCAL_IPS
     - --entryPoints.traefik.address=:8080
     - --api=true
     - --api.dashboard=true
     - --log=true
 #    - --log.filePath=/logs/traefik.log
     - --log.level=debug
     - --accessLog=true
     - --accessLog.filePath=/logs/access.log
     - --accessLog.bufferingSize=100
     - --accessLog.filters.statusCodes=204-299,400-499,500-599
     - --providers.docker=true
     - --providers.docker.endpoint=unix:///var/run/docker.sock
     - --providers.docker.exposedByDefault=false
     - --entrypoints.https.http.tls.options=tls-opts@file
     - --entrypoints.https.http.tls.certresolver=dns-cloudflare
     - --entrypoints.https.http.tls.domains[0].main=$DOMAINNAME_CLOUD_SERVER
     - --entrypoints.https.http.tls.domains[0].sans=*.$DOMAINNAME_CLOUD_SERVER
     - --providers.docker.network=t3_proxy
    #  - --providers.docker.swarmMode=false
     - --providers.file.directory=/rules
     - --providers.file.watch=true
#     - --certificatesResolvers.dns-cloudflare.acme.caServer=https://acme-staging-v02.api.letsencrypt.org/directory # LetsEncrypt Staging Server - uncomment when testing
     - --certificatesResolvers.dns-cloudflare.acme.email=$CLOUDFLARE_EMAIL
     - --certificatesResolvers.dns-cloudflare.acme.storage=/acme.json
     - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.provider=cloudflare
     - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.resolvers=1.1.1.1:53,1.0.0.1:53
     - --certificatesResolvers.dns-cloudflare.acme.dnsChallenge.delayBeforeCheck=50 # To delay DNS check and reduce LE hitrate
   ports:
     - target: 80
       published: 80
       protocol: tcp
       mode: host
     - target: 443
       published: 443
       protocol: tcp
       mode: host
   volumes:
     - /var/run/docker.sock:/var/run/docker.sock:ro
     - $DOCKERDIR/appdata/traefik3/rules/cloudserver:/rules
     - $DOCKERDIR/appdata/traefik3/acme/acme.json:/acme.json
     - $DOCKERDIR/appdata/traefik3/logs:/logs
     - $DOCKERDIR/shared:/shared
   environment:
     <<: *default-tz-puid-pgid
     CF_API_EMAIL: $CLOUDFLARE_EMAIL
     CF_API_KEY: $CLOUDFLARE_API_KEY
     DOMAINNAME_CLOUD_SERVER: $DOMAINNAME_CLOUD_SERVER
   labels:
     - "traefik.enable=true"
     # HTTP-to-HTTPS Redirect
     - "traefik.http.routers.http-catchall.entrypoints=http"
     - "traefik.http.routers.http-catchall.rule=HostRegexp(`{host:.+}`)"
     - "traefik.http.routers.http-catchall.middlewares=redirect-to-https"
     - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
     # HTTP Routers
     - "traefik.http.routers.traefik-rtr.entrypoints=https"
     - "traefik.http.routers.traefik-rtr.rule=Host(`traefik.$DOMAINNAME_CLOUD_SERVER`)"
     - "traefik.http.routers.traefik-rtr.tls=true"
#     - "traefik.http.routers.traefik-rtr.tls.certresolver=dns-cloudflare" # Comment out this line after first run of traefik to force the use of wildcard certs
     - "traefik.http.routers.traefik-rtr.tls.domains[0].main=$DOMAINNAME_CLOUD_SERVER"
     - "traefik.http.routers.traefik-rtr.tls.domains[0].sans=*.$DOMAINNAME_CLOUD_SERVER"
     # Services - API
     - "traefik.http.routers.traefik-rtr.service=api@internal"
     # Middlewares
     - "traefik.http.routers.traefik-rtr.middlewares=chain-oauth@file"

One difference I can already spot is that you use swarm mode whereas I use standalone (ie. providers.docker.XXX as opposed to providers.swarm.XXX) - but it'd be premature of me to claim that this is the root cause. The other obv. difference is that I connect with https (and redirect http to https).

amir20 commented 3 months ago

I doubt swarm mode would affect this. Because I would suspect they would both be proxied similarly.

I am afraid you got a lot going on for me to test. I would take my yml file first and see if that works. If it does, start iteratively updating configuration until it breaks. :) I don't have a better solution.

ManiMatter commented 3 months ago

I'll approach it exactly as you suggest. Will report back once I find the 🐛 and thank you for your kind help on this. much appreciated.

amir20 commented 3 months ago

I did update one of my servers to v3.0.1 with https. It seemed to work. 🤔 So not sure what's different anymore.

ManiMatter commented 3 months ago

I did update one of my servers to v3.0.1 with https. It seemed to work. 🤔 So not sure what's different anymore.

How does your docker-compose look for traefik on that server?

amir20 commented 3 months ago

Pretty similar. Here it is

services:
  traefik:
    image: traefik:v3.0.1
    command:
      - --api.insecure=true
      - --api.dashboard=true
      - --providers.swarm.exposedbydefault=false
      - --providers.swarm.endpoint=unix:///var/run/docker.sock
      - --providers.swarm.network=web
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --certificatesresolvers.myresolver.acme.tlschallenge=true
      - --certificatesresolvers.myresolver.acme.email=findamir@gmail.com
      - --certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json
    ports:
      - target: 80
        published: 80
        protocol: tcp
        mode: host
      - target: 443
        published: 443
        protocol: tcp
        mode: host
      - target: 8080
        published: 8080
        protocol: tcp
    networks:
      - web
    volumes:
      - /data/letsencrypt:/letsencrypt
      - /var/run/docker.sock:/var/run/docker.sock:ro

networks:
  web:
    external: true
ManiMatter commented 3 months ago

We're one step closer. I am using a series of middlewares with my traefik. If I turn off compression, the logs pop up. I try to understand why that happens.

I have two leads I am trying to follow; one the one hand, I see the log entry below (which may be innocent and completely unrelated), and I am also trying to understand if by adding a specific option with excludedContentTypes to the compression logic, I can make the logs flow correctly.

Unable to parse MIME type error="mime: no media type" middlewareName=middlewares-compress@file middlewareType=Compress

# Instruction to traefik to use middlewares chain
   labels:
    ....
     - "traefik.http.routers.traefik-rtr.middlewares=chain-oauth@file"
# Chain of middlewares
http:
  middlewares:
    chain-oauth:
      chain:
        middlewares:
          - middlewares-rate-limit
          - middlewares-https-redirectscheme
          - middlewares-secure-headers
          - middlewares-oauth
          - middlewares-compress  # THIS CAUSES IT
# Compression middleware
    middlewares-compress:
      compress: {}
ManiMatter commented 3 months ago

Solved it:

    middlewares-compress:
      compress: 
        excludedContentTypes:
          - text/event-stream 

Thank you again for your help. Highly appreciate your time and your work.

amir20 commented 3 months ago

Sounds great. I think Traefik's compression middleware is ignoring X-Accel-Buffering: no because I do send it.

If you get a chance, please do send a PR for the FAQ https://dozzle.dev/guide/faq

ManiMatter commented 3 months ago

Hi @amir20, I did not find the X-Accel-Buffering: no in the request or response header of the stream request. Is that where I should find it? Just to understand correctly for when I raise this with traefik.

amir20 commented 3 months ago
Screenshot 2024-06-08 at 2 21 29 PM

It's on the response for log stream.

ManiMatter commented 3 months ago

Interesting. Does not look like that for me:

image
amir20 commented 3 months ago

It might be Traefik altering the headers. What do the headers look like without traefik proxy?

ManiMatter commented 3 months ago
image
amir20 commented 3 months ago

So it's there. :) Not sure why traefik is removing it. I guess it is a signal to proxies to not buffer so they can remove it.