Justin-Walsh commented 4 years ago

Prerequisites

[x] I have verified the problem exists in the latest version
[x] I have searched open and closed issues to make sure it isn't already reported
[x] I have written a descriptive issue title
[x] I have linked the original source of this report
[x] I have tagged the issue appropriately (area/*, kind/bug, tag/regression?)

The bug

After making a successful SSH connection, the subsequent SFTP ~connection~ operation (the 'connection' has a default timeout, but the 'operation' as a whole - in this case 'file upload') does not appear to have a timeout. If the SFTP session hangs or another connection interruption occurs, this can lead to the task blocking subsequent deployments while it waits indefinitely to complete.

What I expected to happen

SFTP ~connections~ operations should time out after a sensible amount of time.

Affected versions

Octopus Server: (At least) 2019.9.10 ->

Workarounds

Monitor task queue and cancel long-running deployments/healthchecks where appropriate.
Make use of https://github.com/OctopusDeploy/OctopusDeploy-Api/blob/master/REST/PowerShell/Deployments/CancelLongRunningTasks.ps1 to scan and stop long-running tasks.
(2022-11-08) Kill the SFTP process on the deployment target (tentacle)
- On a Linux tentacle connecting via ssh to the tentacle and running kill -9 pid with any sftp process ids found via ps aux
- On a windows tentacle RDP into the machine and restart the SFTP process
- Or restart the host machine the SFTP process is running on

Links

Initial report: https://help.octopus.com/t/unhealthy-linux-target-blocking-global-heath-check/25296

donnybell commented 4 years ago

Additional affected customer: https://secure.helpscout.net/conversation/1202457431/66524?folderId=3767295 (internal only)

This customer is also reporting that they must restart the Octopus service in order to cancel a task affected by this bug.

donnybell commented 4 years ago

One more affected: https://secure.helpscout.net/conversation/1302390635/71528/

Also reporting they must restart the tentacle.

Justin-Walsh commented 2 years ago

Additional Report: https://octopus.zendesk.com/agent/tickets/85960 [Internal link]

nathanwoctopusdeploy commented 2 years ago

Additional internal report: https://octopusdeploy.slack.com/archives/C01HZFJRYSH/p1667864156526899 [Internal link]