capistrano / sshkit

A toolkit for deploying code and assets to servers in a repeatable, testable, reliable way.
MIT License
1.15k stars 253 forks source link

Fix Disabling of Connection Pool #448

Closed tisba closed 5 years ago

tisba commented 5 years ago

Fixes https://github.com/capistrano/sshkit/issues/438

The README currently states, that:

SSHKit::Backend::Netssh.pool.idle_timeout = 0

will disable the cache and this can be used to mitigate potential problems from connection pooling. As described in #438 quite the opposite was the case. Connections were not cached, BUT the cache eviction loop was still started. With idle_timeout = 0 the the eviction loop becomes a busy loop hogging one core at 100%.

This PR will skip the launch of the eviction loop (via run_eviction_loop) in case the cache is disabled. If the cache is disabled, there can't be any timed_out_connections because silently_close_connection_later is not used.

find_cache does not use thread_safe_find_or_create_cache if cache is disabled:

https://github.com/capistrano/sshkit/blob/ac181607508de1f1dbc8cb3b6df963ed343f5418/lib/sshkit/backends/connection_pool.rb#L98-L106

I'd like to get some guidance on how to test this (@leehambley maybe?).

Further Suggestion

I also would like to propose that there is a maximum eviction frequency. The idle_timeout should not guarantee when idle connections are removed. Rather it should be considered "after idle_timeout sshkit will evict the connection eventually; never before". If you agree with this, we could simplify the sleep from this:

https://github.com/capistrano/sshkit/blob/ac181607508de1f1dbc8cb3b6df963ed343f5418/lib/sshkit/backends/connection_pool.rb#L129-L139

to

sleep 5
capistrano-bot commented 5 years ago
1 Warning
:warning: There are code changes, but no corresponding tests. Please include tests if this PR introduces any modifications in behavior.

Generated by :no_entry_sign: Danger

leehambley commented 5 years ago

@tisba thanks for the submission, this seems like a great fix. It's dirt simple, and the explanation in the PR is longer than the code sample, which inspires confidence.

I've requested a review from @mattbrictson if he's OK with it, let's roll it and get it out in the next release!

tisba commented 5 years ago

Nice!

Do you want to me to update the changelog, as the nice @capistrano-bot mentioned? :)

Also, what about the other suggestion regarding the slight adjustment of the eviction loop as well?

leehambley commented 5 years ago

Do you want to me to update the changelog, as the nice @capistrano-bot mentioned? :)

Probably a good idea!

Also, what about the other suggestion regarding the slight adjustment of the eviction loop as well?

That concerns me somewhat more than this, lets keep them separate? @mattbrictson what do you think?

mattbrictson commented 5 years ago

Also, what about the other suggestion regarding the slight adjustment of the eviction loop as well?

I think it is safe to include. Just to clarify, we're talking about changing this line:

 sleep([idle_timeout, 5].min)

to be:

sleep(5)

correct?

There are code changes, but no corresponding tests. Please include tests if this PR introduces any modifications in behavior.

I am OK with accepting the PR without tests.

Thanks so much for this PR! 🙏

tisba commented 5 years ago

This is exactly the change I would like to suggest.

With https://github.com/capistrano/sshkit/pull/448/commits/162cbfd503e039c1b2b9e2e39ad7659b22e88ab9 the eviction of timed out connection could obviously take a couple of seconds, but (like I already outlined) the semantics of "will eventually be closed/removed" is not something that users of sshkit should be surprised by.

leehambley commented 5 years ago

Danke für den Einsatz!

Thanks for your contribution, may your SideKiq workers spin less frequently on a busy loop! 🏆

mattbrictson commented 5 years ago

We generally publish new SSHKit releases to rubygems every six weeks, and since the most recent release was just a few days ago, that would mean this fix doesn't get published until about 5 weeks from now. But I'm happy to make exceptions for important bug fixes. Does this qualify?

tisba commented 5 years ago

IMO this is a nasty bug, even though sshkit users might run into it only extreme rare cases. What makes it "worse" IMO, is that the current README (regarding the ability to disable the connection cache) directly leads to the issue fixed by this PR.

On the other hand the mitigation (once you've figured out what is going on), is rather trivial: Don't disable the connection pool.

Me personally I'm okay with not rushing this.

mattbrictson commented 5 years ago

Actually, I'll make a release today. These sorts of major bug fixes don't happen very often.

mattbrictson commented 5 years ago

🚀 released as 1.18.2