JonasProgrammer / docker-machine-driver-hetzner

Docker machine driver for the new hetzner cloud API
https://jonasprogrammer.github.io/docker-machine-driver-hetzner/
MIT License
434 stars 54 forks source link

Rancher - error scaling up servers #116

Closed smirnov-mi closed 11 months ago

smirnov-mi commented 12 months ago

Hi Jonas

I've been using the hetzner driver for a many months, going through a numerous version. I'm using the driver in Rancher, a few days ago I updated the driver from v3.11.x to 5.0.1 . Scaling down doesn't remove virtual machine itself (will address it in a separate issue) Trying to scale something up I'm getting: error

[cmdCreateInner] error setting machine configuration from flags provided: --hetzner-image and --hetzner-image-id are mutually exclusive:Timeout waiting for ssh key

Rancher v2.7.9 (upgraded several times)

Node driver is configured as folowing: _Download URL: https://github.com/JonasProgrammer/docker-machine-driver-hetzner/releases/download/5.0.1/docker-machine-driver-hetzner_5.0.1_linux_amd64.tar.gz UI: https://storage.googleapis.com/hcloud-rancher-v2-ui-driver/component.js whitelist domains: storage.googleapis.com_

JonasProgrammer commented 12 months ago

Hi,

this error is caused by the verification https://github.com/JonasProgrammer/docker-machine-driver-hetzner/blob/54b321f1016b4e07a291c3e01525061d5a47655a/driver/flag_processing.go#L26, which runs on machine creation after all flags (or environment variables or whatever the RPC interfaces passes down) are processed. Rancher probably uses the existing machine configuration alongside the selected image to generate the flag set and this somehow causes both of them being passed. I did just check for places where ImageID is set, however, and at least the driver itself does not 'recycle' ImageID after name lookup has been performed or something, so the problem can only come from the way Rancher interacts with it.

The whole --x-id and --x flag ordeal is a kind of unfortunate remainder from the very first days of the driver (before hcloud-go was even a thing). Given that --x nowadays will work with IDs just fine, I am somewhat tempted to just drop --x-id flags in the future, but this may cause a stir-up when interacting with older docker-machine configurations; though a workaround if ImageID is present in the JSON may be possible without necessity for a flag.

Do you know whether Rancher will pass any additional environment variables or something down to the driver? In that case a fix bypassing the mutuality checks only when running under Rancher may be possible.

smirnov-mi commented 12 months ago

Thanks for looking into this. Unfortunately I don't know about the Rancher passing some additional stuff to the driver. I was not that dwwp into this particular step until now.

smirnov-mi commented 11 months ago

I can't scale-up my server groups anymore, even after "downgrading" to the previous driver version , 3.11.0. I tried several versions, all with the same error.

JonasProgrammer commented 11 months ago

Thinking about it a little more, it could be due to how the default image was implemented in older driver versions and recent updates to the default image name. Can you try whether the current pre-release works for you?

If that does not help either, could you somehow export whatever configuration Rancher is storing (obviously stripping anything token-esque) and a full call log? I'm not familiar with Rancher though, so I cannot really assist here.

smirnov-mi commented 11 months ago

While trying to fix my cluster and my rancher, I ruined the rancher 🥇 Maybe because it's like 4 years old and has been upgraded 20 times, from v 2.4.x up to the the current 2.7.9

So I started the all-new rancher and driver v5.0.1 works fine incl. installing, upscaling, downscaling and deleting(auto-installing a new) nodes.

I don't have any errors now, so I won't be able to verify the 5.0.2+beta version to 100% in regard to this issue :-(

Would you mind to give the 5.0.2+beta some conventional name (maybe without the "+"), rancher can not import the driver using that address above. Then I could at least test the standard functionality.

JonasProgrammer commented 11 months ago

Sorry to hear that, I was unable to act faster as I had some other errands to take care of :/

The proposed 5.0.2 just makes the flag detection logic more lenient. As a little background: old driver versions implemented the default image by default-initializing it, even when no --hetzner-image flag was passed, there was already an image name present (and an ID just takes precedence if given). So especially given your timescale of four years, it is very plausible the configuration actually contained something like {"Image": "ubuntu-18.04", "ImageID": 1234"}. Newer driver versions leave the image empty and just fallback to the default during lookup -- so this is no longer the case. However, to allow an existing machine configuration to still work, there is an escape hatch: the flag validation will fail only if both id and name are given and the name is not the default image. 5.0.2 attempted to correct this oversight by including all default image names down to the original Debian 9.

If your problem is solved, I would refrain from releasing 5.0.2 for now and delete the pre-release; it was just supposed for testing. I will put the lenient changes in a later release eventually.

smirnov-mi commented 11 months ago

Not critical at all. Thanks for your support!