basecamp / kamal

Deploy web apps anywhere.
https://kamal-deploy.org
MIT License
9.38k stars 357 forks source link

fix: Ensure SSH connections are closed after each command execution #859

Open jfanals opened 1 week ago

jfanals commented 1 week ago

This commit addresses an issue where SSH connections were not being properly closed after each command execution, leading to timeout errors during the kamal deploy process.

Problem

When executing commands using the execute method in SSHKit::Runner::Parallel, SSH connections were left open, causing subsequent commands to fail with timeout errors. This issue was particularly evident when running kamal build deliver after kamal registry login during the normal kamal deploy process

Solution

The execute method in the SSHKit::Runner::Parallel::CompleteAll module has been modified to ensure that all SSH connections are closed after each command execution. This is achieved by adding an ensure block to the thread creation logic, which calls SSHKit::Backend::Netssh.pool.close_connections after each command, regardless of whether an exception occurs.

Changes

Impact

This change ensures that SSH connections are properly closed, preventing timeout errors and improving the reliability of the kamal deploy process.

Testing

Tested the changes by running kamal deploy commands. Verified that the timeout errors no longer occur and the deployment process completes successfully.

Closes #857

jfanals commented 1 week ago

For some strange reason the test on Ruby 3.1 failed, it looks like it might have been just a glitch in the test procedures as the code should not affect only one particular version of ruby but all versions.

        ensure
          SSHKit::Backend::Netssh.pool.close_connections

Would it be possible to run the tests again?

djmb commented 1 week ago

I've kicked that off. I'm not sure though about this change. Creating the SSH connections can be expensive especially with large numbers of servers to deploy to.

We configure keepalives with an interval of 30 seconds on the connections, so that generally should stop them from timing out. Maybe you could try reducing the keepalive interval and see if that makes any difference?

jfanals commented 1 week ago

Thanks for the suggestion, I did try to decrease the .ssh/config ServerAliveInterval to 10, unfortunately that did not fix the issue.