confluentinc / ducktape

System integration and performance tests
16 stars 98 forks source link

Add SSH failure callbacks #287

Closed imcdo closed 2 years ago

imcdo commented 2 years ago

One common point of failure for ducktape is paramiko, when a remote account fails, we generally get a non useful error message that has no attachment to the node that is being ran. This PR addresses this issue by adding a flag to ducktape that allows you to specify a specific function that takes in an error and a remote account, that will then be ran on an ssh failure. For instance, if you are running aws instances, you can write your own aws command that validates that the node is still running on aws, and then run this on an ssh failure.

imcdo commented 2 years ago

Does this actually need to be configurable, or is it something that can be baked in? Even if just providing a few options ootb?

I think this is better to be left as something that is configurable, as ducktape isn't responsible for the environment at all and operates in the dark outside of ssh access. Relying on the user to pass in validation about the environment when ssh fails as they are responsible for creating the environment allows for validation a better understanding of the environment.

The main thing I can think of would be checks against e.g. a cloud provider, but since that info is injected after allocation atm, a check like that would need more substantial changes.

As ducktape only knows about ssh, the point here is to allow for these types of checks against cloud providers, docker, etc. The fact that this is configurable means that whomever is writing the validator will have the necessary info, so ducktape doesn't need to know.