amir20 / dozzle

Realtime log viewer for docker containers.
https://dozzle.dev/
MIT License
6.08k stars 305 forks source link

Unable to connect to dozzle-agent on the same network #3344

Open f616 opened 11 hours ago

f616 commented 11 hours ago

🔍 Check for existing issues

How is Dozzle deployed?

Agents

📦 Dozzle version

v8.6.2

✅ Command used to run Dozzle

dozzle-agent running on machine 10.220.2.100

docker run -v /var/run/docker.sock:/var/run/docker.sock -p 7007:7007 amir20/dozzle:latest agent 

dozzle running on machine 10.220.2.99

docker run -p 8080:8080 amir20/dozzle:latest --remote-agent 10.220.2.100:7007

🐛 Describe the bug / provide steps to reproduce it

dozzle-agent running on machine 10.220.2.100

{"level":"info","time":"2024-10-25T17:02:16Z","message":"Dozzle agent version v8.6.2"}
{"level":"info","time":"2024-10-25T17:02:16Z","message":"Agent listening on [::]:7007"}

dozzle running on machine 10.220.2.99 is not able to connect to the agent

{"level":"info","time":"2024-10-25T17:02:59Z","message":"Dozzle version v8.6.2"}
{"level":"warn","error":"rpc error: code = DeadlineExceeded desc = received context error while waiting for new LB policy update: context deadline exceeded","endpoint":"10.220.2.100:7007","time":"2024-10-25T17:03:02Z","message":"error fetching host info for agent"}
{"level":"fatal","time":"2024-10-25T17:03:02Z","message":"Could not connect to any Docker Engine"}

Both machines are on the same network on my corporative private cloud. The issue might be related to any firewall restriction, however I have connectivity from 10.220.2.99 to 10.220.2.100

example of commands run on 10.220.2.99 to check for connectivity.

nc -zv 10.220.2.100 7007
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.220.2.100:7007.
Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
curl --noproxy "*" http://10.220.2.100:7007/health    
curl: (52) Empty reply from server

Is it normal to get a curl: (52) Empty reply from server? Any clues how can I troubleshoot this?

Thank you f616

💻 Environment

Client: Docker Engine - Community
 Version:    27.0.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.15.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.28.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 12
  Running: 7
  Paused: 0
  Stopped: 5
 Images: 14
 Server Version: 27.0.1
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
 runc version: v1.1.13-0-g58aa920
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.0-112-generic
 Operating System: Ubuntu 22.04.4 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.58GiB
 Name: UBTPWHPT003.corporativo.pt
 ID: 4935c36b-5734-4c9b-820d-c05cfafa57bc
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http://proxy.xyz.mngt.local:8080
 HTTPS Proxy: http://proxy.xyz.mngt.local:8080
 No Proxy: localhost,127.0.0.1,docker-registry.example.com,.corp
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

📸 If applicable, add screenshots to help explain your bug

No response

📜 If applicable, attach your Dozzle logs. You many need to enable debug mode. See https://dozzle.dev/guide/debugging.

No response

amir20 commented 11 hours ago

Hmmm, you have tried everything I would have tried. It does seem like a firewall issue.

Is it normal to get a curl: (52) Empty reply from server?

Yes, agents use a private SSL cert which rejects all clients.

You can try adding --level debug to see if any new information is printed. My hunch is no since there is nothing new to log when an agent fails.

example of commands run on 10.220.2.99 to check for connectivity.

For me, nc shows a different output.

nc -vz test.com 7007                                                                                                 
Connection to test.com port 7007 [tcp/afs3-bos] succeeded!

Not sure why? Perhaps bc I am on mac.

I think the best option is to add more log statements. But I don't know what to log since it is the gRPC client that is rejecting the connection.

Let me know if you have any ideas. I'll keep thinking too.