Open Justin-Walsh opened 4 years ago
Additional affected customer: https://secure.helpscout.net/conversation/1202457431/66524?folderId=3767295 (internal only)
This customer is also reporting that they must restart the Octopus service in order to cancel a task affected by this bug.
One more affected: https://secure.helpscout.net/conversation/1302390635/71528/
Also reporting they must restart the tentacle.
Additional Report: https://octopus.zendesk.com/agent/tickets/85960 [Internal link]
Additional internal report: https://octopusdeploy.slack.com/archives/C01HZFJRYSH/p1667864156526899 [Internal link]
Canceling the task can result in the server task getting stuck in the Cancelling state until the sftp process on the tentacle is killed so the provided workaround in this issue may not be correct anymore - updated the workarounds to include killing the SFTP process on the tentacle
Prerequisites
The bug
After making a successful SSH connection, the subsequent SFTP ~connection~ operation (the 'connection' has a default timeout, but the 'operation' as a whole - in this case 'file upload') does not appear to have a timeout. If the SFTP session hangs or another connection interruption occurs, this can lead to the task blocking subsequent deployments while it waits indefinitely to complete.
What I expected to happen
SFTP ~connections~ operations should time out after a sensible amount of time.
Affected versions
Octopus Server: (At least) 2019.9.10 ->
Workarounds
kill -9 pid
with any sftp process ids found viaps aux
Links
Initial report: https://help.octopus.com/t/unhealthy-linux-target-blocking-global-heath-check/25296