MicrosoftDocs / azure-docs

Open source documentation of Microsoft Azure
https://docs.microsoft.com/azure
Creative Commons Attribution 4.0 International
10.31k stars 21.48k forks source link

CI/CD Deployment Gate #97041

Open WarrenKin opened 2 years ago

WarrenKin commented 2 years ago

Should the gate using the StatusCheck function be checking the "stage" slot i.e. no in-flight executions should be occurring since the release pipeline deploys to the "stage" slot and Azure handles the swapping to the "production" slot?


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

MayankBargali-MSFT commented 2 years ago

@WarrenKin Thank you for your feedback! We will review and update as appropriate.

MughundhanRaveendran-MSFT commented 2 years ago

@WarrenKin , Actually the StatusCheck function be checking production slot. Also there should not be any orchestration running in the staging slot as well since the deployment would intitally happen to the staging slot

While the current version of your function app is running in your production slot, deploy the new version of your function app to your staging slot. Before you swap your production and staging slots, check to see if there are any running orchestration instances. After all orchestration instances are complete, you can do the swap.

https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-zero-downtime-deployment#status-check-with-slot

Hope this helps! Please let me know if you have any further questions

WarrenKin commented 2 years ago

Thank you for the explanation. I presumed it was a check on the stage slot because some long running instance could still be running there. Regarding the slot swap, i thought if you enable auto-swap then the swapping would only occur when nothing in the production slot is active. My durable function is continuously active. What's the danger of swapping even if active instances are still running i.e. they will continue to be processed under the stage slot function?

Thanks Warren

On Tue, 16 Aug 2022, 07:43 Mughundhan_R-MSFT, @.***> wrote:

@WarrenKin https://github.com/WarrenKin , Actually the StatusCheck function be checking production slot. Also there should not be any orchestration running in the staging slot as well since the deployment would intitally happen to the staging slot

While the current version of your function app is running in your production slot, deploy the new version of your function app to your staging slot. Before you swap your production and staging slots, check to see if there are any running orchestration instances. After all orchestration instances are complete, you can do the swap.

https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-zero-downtime-deployment#status-check-with-slot

Hope this helps! Please let me know if you have any further questions

— Reply to this email directly, view it on GitHub https://github.com/MicrosoftDocs/azure-docs/issues/97041#issuecomment-1216212192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG4GWM54KQG5MKYNNAMRDMLVZM2CVANCNFSM56LGFJMQ . You are receiving this because you were mentioned.Message ID: @.***>

MughundhanRaveendran-MSFT commented 2 years ago

@WarrenKin , It is not recommended to keep the instances active during the slot swap. The app will be running but it is highly likely that the instance might be killed/lost.

WarrenKin commented 2 years ago

The title for the document is "Zero-downtime deployment for Durable Functions", if you are talking about waiting for all instances to be completed then we are not talking about zero downtime. The durable function I have is running 24/7 365, I can deploy during the night by ending some workflows but this seems to go against the automated nature of the deployment cycle and CI/CD.

My impression from reading this document and others is that the storage accounts hold the state still for each slot and when swapped this does not affect this, therefore the instances can still complete successfully. Only the route is changed, not the slot function's underlying nature. During my testing this has been the case, I have zero downtime and after the swap, the active executions continue in the stage slot function. Can you refer me to a document that says the swap is not recommended if executions are still active in the slots? Thanks.

MughundhanRaveendran-MSFT commented 2 years ago

@cgillum , Could you please share your insights here?

cgillum commented 2 years ago

If I understand the question correctly, it's the Production slot which needs to be checked for in-flight instances. The assumption is that the staging slot is inactive.

Note that this strategy assumes that your orchestrations are not long-running. If they are long-running or if they are eternal, then it won't be practical to wait for them to finish before doing the swap. In such cases, you can still execute the swap, but you'll need to ensure that your code changes won't break the in-flight instances. See the Versioning documentation for more information about this.

WarrenKin commented 2 years ago

If I understand the question correctly, it's the Production slot which needs to be checked for in-flight instances. The assumption is that the staging slot is inactive.

OK since my instances can be long-running, I should check that the stage slot is not active when I deploy i.e. destroy it.

Note that this strategy assumes that your orchestrations are not long-running. If they are long-running or if they are eternal, then it won't be practical to wait for them to finish before doing the swap. In such cases, you can still execute the swap, but you'll need to ensure that your code changes won't break the in-flight instances. See the Versioning documentation for more information about this.

This is the piece that I am confused about. For example, I have an instance still active on the production slot when I deploy my new version to the stage slot, the swap takes place i.e. the routing is changed so that new triggers run my new function code. The active instance surely continues and completes in the stage slot (old code) since the state is held on a different storage account to the production slot? Therefore, it doesn't matter if there are breaking changes for any active in-flight instance because they will still use the old code base. If this is not the case then what is the point in having separate state storage accounts?