OctopusDeploy / Issues

| Public | Bug reports and known issues for Octopus Deploy and all related tools
https://octopus.com
161 stars 20 forks source link

Allow Health Checks to run in parallel with deployments and runbooks on a tentacle target. #8583

Open LukeButters opened 7 months ago

LukeButters commented 7 months ago

The enhancement

The Need

Currently health checks on Tentacles Targets can not run at the same time as a deployment. This means the 1s idempotent readonly health check script can be blocked for hours resulting in the health check being blocked or with https://github.com/OctopusDeploy/Issues/issues/8118 the customer is given reports about the health check failing.

If health checks could run in parallel with deployments then:

Doing so will help out in cases when a deployment or health check script is hung indefinitely and so blocking the other indefinitely. For example this would prevent a hung health check from blocking a deployment.

This will also reduce the number of running tasks, and so reduce the number of slots taken up in a customers task cap.

Background

When a deployment script or health check script runs on a tentacle a RunningScript mutex is taken on the tentacle. The deployment typically takes a FullIsolation mutex while the health check takes NoIsolation mutex. The FullIsolation mutex can not be held while any NoIsolation mutex is held for the same name.

When a deployment is already running a script on a target and a health check is kicked off the following occurs.

The above assumes https://github.com/OctopusDeploy/Issues/issues/8118 is not applied.

It is not clear why health checks can not run in parallel with Deployments since health checks (if using the default script) do not modify the tentacle. It is not clear if sending potentially 10s of thousands of RPC calls to the tentacle was intentionally chosen over running the scripts in parallel.

This enhancement could be easily feature toggled at either a environment variable level or in the machine policy, which may make sense since customers can provide their own custom health check scripts to run.

Links

https://github.com/OctopusDeploy/Issues/issues/8581 https://github.com/OctopusDeploy/Issues/issues/8582 https://github.com/OctopusDeploy/Issues/issues/8118 https://octopusdeploy.slack.com/archives/CNHBHV2BX/p1676257279255989 [SC-68672]

pawelpabich commented 7 months ago

It is not clear why health checks can not run in parallel with Deployments since health checks (if using the default script) do not modify the tentacle. It is not clear if sending potentially 10s of thousands of RPC calls to the tentacle was intentionally chosen over running the scripts in parallel.

This is an assumption we would need to validate. @droyad any insights?

droyad commented 7 months ago

I believe the historic default was that only one thing can run on a target at a time. Over time we've added things that do't need that.

It is not clear why health checks can not run in parallel

I think this mixes cause and effect. Deployments don't allow anything else to run at the same time, so health checks can't run in parallel (Not health checks can't run when deployments occur).

The current locking mechanism doesn't allow for discriminating based on task type. i.e it's not possible to say health checks can run but other deployments can't.