amir20 / dozzle

Realtime log viewer for docker containers.
https://dozzle.dev/
MIT License
5.89k stars 295 forks source link

need help with docker swarm compose #3048

Closed falkheiland closed 3 months ago

falkheiland commented 3 months ago

i am running dozzle in a 3 node docker swarm mode cluster (all nodes are manager nodes) behind traefik. when i open the dozzle webinterface, it only shows the containers on that node (which is selected by traefik). the Swarm Mode slider is not enabled by default.

https://dozzle.dev/guide/swarm-mode says:

But it does mean that each host needs to be setup with Dozzle.

this seems to be the only requirement for the config.

i also tried using the dev.dozzle.group - label, since i am not sure about the services / no service usage. i am sure i am missing something here... can you maybe provide a docker swarm mode specific docker-compose.yml as reference?

docker-compose.yml:

version: "3.8"
services:
  dozzle:
    image: amir20/dozzle:v7.0.6
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      - DOZZLE_HOSTNAME={{.Node.Hostname}}
      - DOZZLE_NO_ANALYTICS=1
      - DOZZLE_BASE=/dozzle
      - DOZZLE_ENABLE_ACTIONS=true
      - DOZZLE_LEVEL=debug
    networks:
      - proxy
    deploy:
      mode: global
      placement:
        constraints:
          - node.role == manager
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback
      restart_policy:
        condition: on-failure
        delay: 10s
        max_attempts: 3
        window: 120s
      labels:
        - "dev.dozzle.group=monitoring-label"
        - "traefik......"
docker stack ps monitoring
5cmpl98082pv   monitoring_dozzle.9mpz1pedpjgvxkto9gidpz987     amir20/dozzle:v7.0.6               n2     Running         Running 4 minutes ago             
e1zcxqnl4hee   monitoring_dozzle.hmnioh735sindvlf3o67uclvx     amir20/dozzle:v7.0.6               n3     Running         Running 4 minutes ago             
sezl9geq3jn5   monitoring_dozzle.i6ti0prto11qsmlxuvtcanp50     amir20/dozzle:v7.0.6               n1     Running         Running 4 minutes ago   

dozzle service logs:

 docker service logs monitoring_dozzle
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:47+02:00" level=info msg="Dozzle version v7.0.6"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:47+02:00" level=debug msg="filterArgs = {map[]}"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:47+02:00" level=debug msg="Connected to local Docker Engine"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:47+02:00" level=info msg="Connected to 1 Docker Engine(s)"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:47+02:00" level=debug msg="subscribing to docker events from container store localhost"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:47+02:00" level=debug msg="Analytics disabled."
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:47+02:00" level=info msg="Accepting connections on :8080"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:48+02:00" level=debug msg="container 6fc90b69186d started"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="resetting timer for container stats collector localhost"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="starting to stream stats for: 6fc90b69186d"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:45:47+02:00" level=info msg="Dozzle version v7.0.6"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:45:47+02:00" level=debug msg="filterArgs = {map[]}"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:45:47+02:00" level=debug msg="Connected to local Docker Engine"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:45:47+02:00" level=info msg="Connected to 1 Docker Engine(s)"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:45:47+02:00" level=debug msg="Analytics disabled."
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:45:47+02:00" level=debug msg="subscribing to docker events from container store localhost"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:45:47+02:00" level=info msg="Accepting connections on :8080"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:45:48+02:00" level=debug msg="container 8be7937ed5d3 started"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="starting to stream stats for: 4f76367075a5"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="starting to stream stats for: 4a4ba942c306"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="starting to stream stats for: 6231439d52d4"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="starting to stream stats for: a5d88383dcf6"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="starting to stream stats for: ff96046caff5"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="starting to stream stats for: f626c268366d"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="starting to stream stats for: 25bc4f4fed0a"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="subscribing to docker events from stats collector localhost"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:45:58+02:00" level=debug msg="starting to stream stats for: ae07849b8114"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:46:18+02:00" level=debug msg="health status for container 6fc90b69186d is healthy"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:46:18+02:00" level=debug msg="triggering docker health event: health_status: healthy"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:47:55+02:00" level=debug msg="context done, closing event stream"
monitoring_dozzle.0.alnhtdo04f5y@n2    | time="2024-06-19T16:47:55+02:00" level=debug msg="scheduled to stop container stats collector localhost"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:45:59+02:00" level=debug msg="Cache miss for []releases.Release"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:46:18+02:00" level=debug msg="health status for container 8be7937ed5d3 is healthy"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:47:56+02:00" level=debug msg="resetting timer for container stats collector localhost"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:47:56+02:00" level=debug msg="starting to stream stats for: 801746695b49"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:47:56+02:00" level=debug msg="starting to stream stats for: f7ae7cddd531"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:47:56+02:00" level=debug msg="starting to stream stats for: 8be7937ed5d3"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:47:56+02:00" level=debug msg="starting to stream stats for: 3cb064e86ff4"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:47:56+02:00" level=debug msg="starting to stream stats for: 591b02d99f48"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:47:56+02:00" level=debug msg="subscribing to docker events from stats collector localhost"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:49:10+02:00" level=debug msg="streaming logs for container" id=801746695b49 since="2024-06-19 13:08:41.699760843 +0000 UTC" stdType=all
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:49:12+02:00" level=debug msg="closing container channel streamContainerLogs"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:49:12+02:00" level=debug msg="context cancelled"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:49:12+02:00" level=debug msg="runtime mem stats" allocated="4.7 MB" routines=38 system="12 MB" totalAllocated="8.9 MB"
monitoring_dozzle.0.pehx5gg91dtm@n1    | time="2024-06-19T16:49:24+02:00" level=debug msg="streaming logs for container" id=3cb064e86ff4 since="<nil>" stdType=all
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:45:49+02:00" level=info msg="Dozzle version v7.0.6"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:45:49+02:00" level=debug msg="filterArgs = {map[]}"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:45:49+02:00" level=debug msg="Connected to local Docker Engine"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:45:49+02:00" level=info msg="Connected to 1 Docker Engine(s)"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:45:49+02:00" level=debug msg="Analytics disabled."
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:45:49+02:00" level=info msg="Accepting connections on :8080"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:45:49+02:00" level=debug msg="subscribing to docker events from container store localhost"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:45:49+02:00" level=debug msg="container 5d7a7b4c0f4e started"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:46:05+02:00" level=debug msg="streaming logs for container" id=b3186dc7dfdf since="<nil>" stdType=all
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:46:20+02:00" level=debug msg="health status for container 5d7a7b4c0f4e is healthy"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:46:48+02:00" level=debug msg="closing container channel streamServiceLogs"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:46:48+02:00" level=debug msg="context cancelled"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:46:48+02:00" level=debug msg="runtime mem stats" allocated="3.5 MB" routines=15 system="11 MB" totalAllocated="4.3 MB"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:47:57+02:00" level=debug msg="Cache miss for []releases.Release"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:48:14+02:00" level=debug msg="streaming logs for container" id=b3186dc7dfdf since="<nil>" stdType=all
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:49:11+02:00" level=debug msg="closing container channel streamServiceLogs"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:49:11+02:00" level=debug msg="context cancelled"
monitoring_dozzle.0.9d0tjrqrcyzo@n3    | time="2024-06-19T16:49:11+02:00" level=debug msg="runtime mem stats" allocated="4.0 MB" routines=17 system="12 MB" totalAllocated="7.7 MB"
amir20 commented 3 months ago

Hi @falkheiland!

That is correct. The missing part for you is setting up each node. Dozzle v7 implemented all logic to manage services, stacks across nodes. However, the nodes still need to be setup using DOZZLE_REMOTE_HOST. This is not perfect but I am thinking about it.

Here is an example using socket proxy with swarm that should work:

services:
  dozzle:
    image: amir20/dozzle:latest
    ports:
      - "7575:8080"
    environment:
      DOZZLE_REMOTE_HOST: tcp://<yourfirstdockernodehostnamehere>-doz_proxy:2375,tcp://<yourseconddockernodehostnamehere>-doz_proxy:2375,etc...
    deploy:
      replicas: 1
      update_config:
        delay: 20s
        failure_action: rollback   

  proxy:
    image: tecnativa/docker-socket-proxy:0.1.2
    hostname: "{{.Node.Hostname}}-{{.Service.Name}}"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      CONTAINERS: 1
      INFO: 1      
    deploy:
      mode: global
      update_config:
        delay: 20s
        failure_action: rollback    

Note the use of DOZZLE_REMOTE_HOST as documented https://dozzle.dev/guide/remote-hosts. Then each swarm node needs to have tecnativa/docker-socket-proxy.

On the roadmap

In the future I'd like to leverage mode: global and remove the need for proxy. However that is not implemented now. Currently, Dozzle does the swarm grouping based on what it can see. Other solutions like dockge or portainer have agents. I think that's too much personally. But I am thinking maybe Dozzle can communicate with itself.

One of the biggest reason I haven't implemented is that I feel remote hosts are a super set of everything. And it does work with swarm too. The only downside is that one would need to manage their own nodes. So if you have a lot of nodes updating then it would be a pain.

  1. Let me know if the example solution works for you
  2. Let me know what your thoughts are on the future of Dozzle with swarm support. I think it makes sense to have a simpler setup, but I don't know if there is any tradeoffs to just have remote hosts via socket-proxy.

Thanks!

falkheiland commented 3 months ago

i did the changes and am now facing another problem.

time="2024-06-19T17:51:54+02:00" level=warning msg="Could not connect to remote host tcp:n1-monitoring_dozzle_proxy:2375: error during connect: Get \"http://n1-monitoring_dozzle_proxy:2375/v1.45/containers/json?all=1\": dial tcp: lookup n1-monitoring_dozzle_proxy on 127.0.0.11:53: server misbehaving"

i can ping the hostname from all the dozzle_proxies to all the dozzle_proxies, i also can get the file http://n3-monitoring_dozzle_proxy:2375/v1.45/info (which also appears in the logs) form all the dozzle_proxies.

also a nc on 2375 is responding from dozzle_proxy to dozzle_proxy.

but i cannot test from the dozzle container itself - since i cannot get a console session.

OCI runtime exec failed: exec failed: unable to start container process: exec: "sh": executable file not found in $PATH: unknown

 *  The terminal process "/bin/bash '-c', 'docker exec -it 5b71b6cdad50f39858b6e096a739d01fe52be024926f1f90c06f2a5a45dbb5bd sh'" failed to launch (exit code: 126). 
 *  Terminal will be reused by tasks, press any key to close it. 

is there way to debug from that container itself?

amir20 commented 3 months ago

Hmmm. Can you enable debug mode? https://dozzle.dev/guide/debugging

Admittedly, I have never seen server misbehaving error. I will do a little research.

Does curl http://n1-monitoring_dozzle_proxy:2375/v1.45/containers/json work?

I guess this sort of answers my second question. One con of having to use proxy is the confusion of setting it up. I imagine having an agent similar to your first setup would have avoided that. :)

amir20 commented 3 months ago

Found some info at https://stackoverflow.com/questions/28332845/docker-network-issue-server-misbehaving which might suggest DNS issues. I have never seen these so I think something is different about your setup.

I forgot to answer your last question

is there way to debug from that container itself?

Not easily. There is nothing installed on the docker container except the Dozzle binary. But you could change Dockerfile to build from alpine instead and have access to curl and other commands.

However, first I would look at resolving the issues as suggested in the stackoverflow.

amir20 commented 3 months ago

Any update? Did it work?

falkheiland commented 3 months ago

so. after a lot of trial and error - i finally deployed the same thing to a seamingly similar configured cluster - and here it works just fine.

i will try and figure out what is different between those installations and will post here if i find it out.

@amir20 thank you for your quick response, love the product!

falkheiland commented 3 months ago

well, i do face the same problem on that initially working cluster as i did with the first one. the name resolution via name spaces seems to cause issues, at least for me. as much as i like the product - i will not be able to use it in this environment.

amir20 commented 3 months ago

@falkheiland I had seen some other message from you but they are not here. I am not sure what happened. I'll try to respond but might be losing some context.

if i run the stack only with the internal network (obv no web ui access via traefik then) it works ( from the logs). as soon as i add the overlay network in the mix, the doozle service does not seem to be able to get the name resolution for the dozzle-proxies.

I have noticed in all your examples remote host is pointing to some domain. Have you tried using actual IP address of the node eg. DOZZLE_REMOTE_HOST: tcp://10.0.1.2:2375 My hunch is that Docker doesn't like the DNS.

since i cannot get a console session on the dozzle container to debug, i configured the dozzle-proxy containers to also have both the external and the internal networks running. here the name resolution via the set hostnames works w/o a problem - the dozzle container seems to have that problem only for me

Would be helpful if I create a PR with alpine?

i will not be able to use it in this environment

It's unfortunate. If you can give me some way to reproduce this in AWS, DigitalOcean or something else then I can try testing it myself. I use Orbstack which comes with VM support. I did setup 3 VMs and it did seem to work. I haven't tried attaching a proxy network. Maybe that's next.

Finally, I think a lot of these issues are related to using socket proxy. In the back of my mind, I think creating an agent that would remove the need for socket-proxy might fix it. I have started thinking about it. It seems like a lot of work but maybe I can do it on my spare time as a fun project. It would require setting up gRPC, mesh, and distributed computing to have all agents talk to each other.

For now, if you are able to reproduce this for me using some kind of compose file and VMs locally then I can try to debug.

amir20 commented 3 months ago

Created https://github.com/amir20/dozzle/issues/3052. Feedback welcomed.

falkheiland commented 3 months ago

since i have been away for the last days: #3052 (agent support) would be the best option imho.

amir20 commented 3 months ago

@falkheiland try it out. Instructions are at https://github.com/amir20/dozzle/pull/3058

I still got some work to do but I think for your use case it should be pretty easy.