acouvreur / sablier

Start your containers on demand, shut them down automatically when there's no activity. Docker, Docker Swarm Mode and Kubernetes compatible.
https://acouvreur.github.io/sablier/
GNU Affero General Public License v3.0
1.3k stars 46 forks source link

Update docker_classic.go with delay #315

Closed Sblop closed 3 months ago

Sblop commented 3 months ago

This change addresses the issue reported in Issue #287, where the CPU load reaches 100% when the DOCKER_HOST environment variable is set to a value other than the default /var/run/docker.sock. The infinite loop in the NotifyInstanceStopped method was causing excessive CPU usage due to continuous polling without any delay. Introducing a small delay (100 milliseconds) in the loop reduces CPU load while maintaining functionality. This modification ensures that the event loop does not consume excessive CPU resources, especially when connected to a remote Docker host.

acouvreur commented 3 months ago

Hello @Sblop !

Thank you for your contribution!


Did you successfully run this version of the code and saw significant CPU usage reduction ? I can produce a docker image for you to try if needed.


Can you please give more details on why do you think this is the root cause of the CPU usage ?

Because I'd believe a single call is done, and then it is waiting for events, right ?

So how come the select statement is looping out of control ?

If there is no default case and none of the channel operations is ready, the select statement blocks until at least one of the cases is ready.

Sblop commented 3 months ago

In my initial attempt, this seemed to bring down the CPU, but I believe I was mistaken... After first container has been started, the CPU stays high, using the DOCKER_HOST env. The debug log stays the same:

time="2024-06-06T12:13:16+02:00" level=debug msg="map[]"
time="2024-06-06T12:13:18+02:00" level=debug msg="map[]"
time="2024-06-06T12:13:20+02:00" level=debug msg="map[]"
time="2024-06-06T12:13:22+02:00" level=debug msg="map[]"
time="2024-06-06T12:13:24+02:00" level=debug msg="map[]"
time="2024-06-06T12:13:26+02:00" level=debug msg="map[]"
time="2024-06-06T12:13:28+02:00" level=debug msg="map[]"
time="2024-06-06T12:13:30+02:00" level=debug msg="map[]"
time="2024-06-06T12:13:32+02:00" level=debug msg="map[]"
time="2024-06-06T12:13:34+02:00" level=debug msg="map[]"

Do you have any suggestions what might cause this?

The log from the docker socket proxy is this:

192.168.77.4 - - [06/Jun/2024:10:18:56 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:18:58 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:00 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:02 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:04 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:06 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:08 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:10 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:12 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:14 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:16 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:18 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:20 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:22 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

192.168.77.4 - - [06/Jun/2024:10:19:24 +0000] "GET /v1.41/containers/json?all=1&filters=%7B%22label%22%3A%7B%22sablier.enable%3Dtrue%22%3Atrue%7D%7D HTTP/1.1" 200 3 "-" "Go-http-client/1.1"

So the sablier seems very busy at chatting with docker.... Do you have any suggestions on how to solve this?

acouvreur commented 3 months ago

I believe that these logs are from the group discovery. Which queries all your containers every 2 seconds.

Check this:

https://github.com/acouvreur/sablier/blob/main/app/sessions/sessions_manager.go#L64

And

https://github.com/acouvreur/sablier/blob/main/app/sessions/groups_watcher.go

acouvreur commented 3 months ago

If you can enable some monitoring and send some report of the CPU usage, it could be very interesting.

I can try to have that kind of performance check in the future. That might interesting to see the impact of let's say, an upgrade on golang, or a docker package, whatever.

acouvreur commented 3 months ago

Hello @Sblop I will close this issue in favor of #329

You can also take a look at #272 for the whole investigation.