JonasProgrammer / docker-machine-driver-hetzner

Docker machine driver for the new hetzner cloud API
https://jonasprogrammer.github.io/docker-machine-driver-hetzner/
MIT License
434 stars 54 forks source link

Provide fallback server type #111

Open fschrempf opened 1 year ago

fschrempf commented 1 year ago

Sometimes creating servers of a certain type fails because of availability issues at Hetzner, while using other server types still works just fine. For these cases it would be great to have a way of specifying a fallback server type that should be tried if the primary server type doesn't work.

JonasProgrammer commented 1 year ago

Hi,

given that there were some issues regarding availability, I think the basic suggestion is good.

However, I would like some more discussion on how exactly you suppose the fallback behavior should be invoked. Will the server type lookup fail-fast? Or do we need a backup strategy for when the server creation itself fails.

Also, while the driver does not look at the server type after creation, it is saved to the machine JSON. Some downstream software might use this to gather information about the server and would see a discrepancy.

So while the basic idea sounds good, there are some points that need further discussion. Rushing things would lead to something that does more harm than good due to the potential obfuscation included.

inakimalerba commented 1 year ago

Hello !

I would very much like this.

Currently I have a very-custom-not-upstreamable POC patch that adds a feature compatible with this request. I tested changing the server and location type options to a combined parameter that let's you set a comma-separated list of server_type::location options, which the runner uses one by one until it can correctly place a server.

For example: --server-type-location cx21::fsn1,cx21-nbg1,cpx21::fsn1,cpx21::nbg1

The approach was to try them one by one replacing the properties in the class and retry the creation so the downstream code sees the correct.

Would something like this work? One of the pain points with this approach was backwards compatibility.

JonasProgrammer commented 1 year ago

HI,

thanks for the input!

Currently I have a very-custom-not-upstreamable POC patch that adds a feature compatible with this request. I tested changing the server and location type options to a combined parameter that let's you set a comma-separated list of server_type::location options, which the runner uses one by one until it can correctly place a server.

For example: --server-type-location cx21::fsn1,cx21-nbg1,cpx21::fsn1,cpx21::nbg1

I really like the idea. One potential problem I see is getting the image architecture right, as might have to switch on-the-fly if the chain of alternatives contains some ARM servers. Though we could simply go with saying such behavior is undefined and all types specified should have the same architecture.

Would something like this work? One of the pain points with this approach was backwards compatibility.

That is the biggest pain point to me. The flags are not that much of a problem IMO, we could make do with mutual exclusion. I do see a problem in terms of configuration vs. execution: The way I understand it, the driver is not supposed to mutate the configuration after SetConfigFromFlags. But there is upstream tooling, that depends on the image, type, ... fields to be filled.

I currently don't really see a 'nice' way to get this in line with the normal server creation. Perhaps one could query all possible combinations in SetConfigFromFlags and use the first working configuration -- though I have not checked if the API can even report this without actually creating a server. Even then, it would be suspect to TOCTOU.

Yet another point to consider is whether the alternatives should be stored as part of the machine state, i.e. should docker machine recreate run through all combinations again or should it use the working configuration initially detected.

I still like this idea, after all, but it really is a can of worms to me. The more light I try to shed on this the more potential problem arise. As of now, I'm unsure of whether this should be an intrinsic feature of the driver or whether some example bash script on how to run docker-machine through a list of combinations with reasonably low timeouts/retries would suffice. I'm always open for new ideas though.