We have seen occasional orchestration failures during deployment to our function app, utilizing slot deployments. The error message is as follows:
Non-Deterministic workflow detected: A previous execution of this orchestration scheduled an activity task with sequence ID 0 and name 'DetectARInvoiceChanges' (version ''), but the current replay execution hasn't (yet?) scheduled this task. Was a change made to the orchestrator code after this instance had already started running?
This error occurs only sometimes when doing a slot deployment, and not always.
Here are some specific of our setup:
Function app set up for slot deployments, with two slots: staging and production.
Each slot has its own task hubs and orchestration instance / history tables.
During a slot deployment, any number of orchestrations or activities can be executing in the production slot.
When the slot swap occurs during a deployment, there is a brief moment where orchestrations may be resumed on the staging slot, while all previous executions for that orchestration up to that point had occurred on the production slot.
When this occurs, the orchestration execution on the staging slot fails with the error message specified above.
Based on the content of the error message, we believe that this failure occurs while the staging slot is replaying the orchestration history, and it cannot find the scheduled activity tasks that were executed on the production slot.
This is duplicate of #2635, was opened on a different repo and didn't see that this was moved here, so I reopened as another issue. Closing this one as #2635 is more accurate.
We have seen occasional orchestration failures during deployment to our function app, utilizing slot deployments. The error message is as follows:
Non-Deterministic workflow detected: A previous execution of this orchestration scheduled an activity task with sequence ID 0 and name 'DetectARInvoiceChanges' (version ''), but the current replay execution hasn't (yet?) scheduled this task. Was a change made to the orchestrator code after this instance had already started running?
This error occurs only sometimes when doing a slot deployment, and not always.
Here are some specific of our setup:
staging
andproduction
.production
slot.staging
slot, while all previous executions for that orchestration up to that point had occurred on theproduction
slot.staging
slot fails with the error message specified above.Based on the content of the error message, we believe that this failure occurs while the
staging
slot is replaying the orchestration history, and it cannot find the scheduled activity tasks that were executed on theproduction
slot.