JonasProgrammer / docker-machine-driver-hetzner

Docker machine driver for the new hetzner cloud API
https://jonasprogrammer.github.io/docker-machine-driver-hetzner/
MIT License
434 stars 54 forks source link

Lots of machines in cloud but only a few in docker-machine #115

Open buffcode opened 1 year ago

buffcode commented 1 year ago

We are currently running on 4.1.0 (I will upgrade later today) and we have the problem (since multiple versions) that docker-machine creates servers but some how fails to remember those.

I recently manually deleted about 30 servers in Hetzner cloud that weren't known to docker-machine ls (anymore?) but definitely created this way.

We are using docker-machine to spin up cloud runners for GitLab CI, so every runner has a fixed prefix and is easily recognizable.

Is there a way to sync docker-machine with hetzner cloud, so that these servers get picked up again? Or that docker-machine recognizes those unmanaged machines and removes them? This is filling our resource limits and bills as well :)

Can I provide logs (which?) to debug this? This usually stacks up over multiple weeks and does not happen on a daily basis.

buffcode commented 1 year ago

After upgrading to 5.0.1 and creating all of the missing servers:

runner-ovfjcph1-runner-1700632818-61a5a758   -        hetzner   Error                                         Unknown    coul
d not execute drivers.MustBeRunning: could not get server by ID: limit of 5000 requests per hour for XXXX:XXXX:c0c:b1cc::1 rea
ched (rate_limit_exceeded)
runner-ovfjcph1-runner-1700639271-6e1e6488   -        hetzner   Error                                         Unknown    coul
d not execute drivers.MustBeRunning: could not get server by ID: limit of 3600 requests per hour reached (rate_limit_exceeded
)

Maybe this also affects which machines/states are known on both sides?

buffcode commented 1 year ago

After the API being accessible again I can confirm that docker-machine and Hetzner cloud are now out of sync.

Docker reports 19 servers while Hetzner currently has 42 servers.

JonasProgrammer commented 1 year ago

Hi,

sorry I came back only now, I was dealing with some medical issues.

It is indeed possible for Hetzner and the driver to get out-of-sync. docker-machine implements a rather basic RPC protocol and the server creation logic boils down to a pre-create check (which on a best-effort basis tries to ensure the machine creation should succeed), the actual creation and then waiting for the machine to come up. Depending on which step fails, docker-machine may conclude the machine has not been created and decide to remove the files; the driver on the other hand only performs a tear-down during the creation steps.

Unfortunately the setup process is wonky and inherently racy. There are some options to configure retry behavior, intended specifically for dealing with rate-limiting issues, but there is still no guaranteed. The best thing I can recommend is to check the servers manually after an abnormal creation failure, perhaps tagging them beforehand so they are easier to identify. I am myself dealing with this problem when terminating docker-machine prematurely in development and sometimes having left-over resources (including running servers) then; it's annoying, but unfortunately for me so far the aforementioned manual way is the best thing I could come up with.