JonasProgrammer / docker-machine-driver-hetzner

Docker machine driver for the new hetzner cloud API
https://jonasprogrammer.github.io/docker-machine-driver-hetzner/
MIT License
431 stars 53 forks source link

Problem with Rancher when scaling up #122

Open michafn opened 4 months ago

michafn commented 4 months ago

Version 5.0.2 Rancher: 2.8.0

Hello,

we have been using rancher in combination with the hetzner node driver for at least two years now and it worked entirely fine. However, all for sudden it seem we cannot scale up any nodes anymore.

During creation this the process stops throwing:

Flag provided but not defined: -hetzner-wait-on-error:Timeout waiting for ssh key.

I tried to add to the node template: --hetzner-wait-on-error=40 --hetzner-wait-on-polling=40

to the Engine options with no effect. Likewise added it to the environment variable with no effect. I assume that the API is reuesting further parameters, which i cannot properly pass, but i have no idea at the moment where to start investigating.

What I can say is that the machines are not even created anymore at hetzner. This used to work perfectly well, but for some reasons does not work anymore. I also set up a new API token to ensure, that this is not the problem, but no effect.

Any ideas? Anyone had similar issues? This is having a dramatic impact on our cluster, since we cannot scale anymore at all while manually adding new nodes is not possible either...

Any sort of help appreciated!

michafn commented 4 months ago

Addition: another issue we came across before it didn't work anymore at all: when creating a new node, the internal IP was used as external IP, which leads to a situation where the external IP and internal IP are the same: the internal IP. I assume this can be sort of a mapping issue? Anyone else got this?

michafn commented 4 months ago

Update: I tried to spin up a new cluster with the hetzner node driver, which actually works as expected. So it somehow seems that the issue lies in the cluster itself. So the core question would be then: why does it work to create ssh keys and machines for cluster B whereas it doesn't work for cluster A. this seems quite irrational since the driver configuration is exactly the same for both. how can that be? Even when using the same node template, there can't be much differences or am i missing an important part here somewhere?

elderapo commented 1 month ago

Do you recall what version you had before updating to 5.0.2?

I've experienced the same issue when upgrading 3.13.0 => 5.0.2. Setting the version back to 3.13.0 didn't help. Had to restore my rancher instance from a backup and only then did version by version update (tested scaling up/down after each) and managed to make it work up until 5.0.2 for "old cluster".