Azure / logicapps

Azure Logic Apps labs, samples, and tools
MIT License
358 stars 299 forks source link

Workflows not getting timed out based on the host settings #782

Closed krisvijaykb closed 1 year ago

krisvijaykb commented 1 year ago

Describe the Bug

As per the doc - Run duration setting , The workflows are not getting timed out , instead they continue to run till all the actions are completed.

Plan Type

Standard

Steps to Reproduce the Bug or Issue

  1. Go to Logic App standard workflow.
  2. Add a delay action with 5 minutes as value.
  3. Go to host.json file for the logic app.
  4. Add "Runtime.Backend.FlowRunTimeout" setting with a value "0.00:01:00"
  5. Trigger the logic app.
  6. Notice the workflow to run beyond the timeout limit set.

Workflow JSON

No response

Screenshots or Videos

No response

Additional context

We have also tried setting this value in app setting and changing the duration value to different limits. Nothing made the workfow to timeout.

AB#23805093

krisvijaykb commented 1 year ago

@AbodeSaafan Can you please help with this? I have seen you have resolved similar issue here - https://github.com/Azure/logicapps/issues/460

AbodeSaafan commented 1 year ago

This is a different question/issue than #460 , @xuehongg can you help with this, thanks

krisvijaykb commented 1 year ago

@xuehongg , Any thoughts on this, thanks.

krisvijaykb commented 1 year ago

@MayankBargali-MSFT , @AbodeSaafan , Can anyone else help with suggestions on this please?

xuehongg commented 1 year ago

@krisvijaykb

The minimum value for Runtime.Backend.FlowRunTimeout is 7 days. 7.00:00:00. We will update the doc.

krisvijaykb commented 1 year ago

@xuehongg , So we can't really test this in dev environment with such high value. Is there a setting to prevent workflows from running long time when any of the action become stuck?

xuehongg commented 1 year ago

@krisvijaykb

Most actions have timeout of 2 minutes. In what scenario do you see action becoming stuck? Do you have a repro? If so, we can take a look.

krisvijaykb commented 1 year ago

@xuehongg , We had an incident some time back because of a Microsoft configuration issue with node.js where the correct node.js library couldn't be loaded. This kept the workflows running for hours instead of timing out and we had to cancel them manually.

We are not able to replicate them now. Also the run history lapsed since it is past 90 days.

So we are looking for a setting to restrict workflows from running more than a hour if possible. Please suggest if there is an appropriate setting for this?

I came across this setting "Runtime.FlowRunRetryableActionJobCallback.ActionJobExecutionTimeout" but not sure if it will work for all actions(managed and builtin)

xuehongg commented 1 year ago

@krisvijaykb

I am not aware of a setting to restrict a workflow from running more than an hour. "Runtime.FlowRunRetryableActionJobCallback.ActionJobExecutionTimeout" is for individual actions, not for the whole workflow.

krisvijaykb commented 1 year ago

@xuehongg , Thanks for the response.

I hope I can use "Runtime.FlowRunRetryableActionJobCallback.ActionJobExecutionTimeout" to restrict actions from getting struck or running more than a hour which eventually restrict the workflow from running long hours.

Can you please confirm whether this setting works for all actions of the logic app(both managed actions and built-in actions)?

xuehongg commented 1 year ago

@krisvijaykb

This setting works for built-in connectors. Managed connectors behavior the same way as in logic app consumption that they will time out after 2 minutes.

krisvijaykb commented 1 year ago

@xuehongg , Thanks for your inputs.

I will try to set this setting for built-in connectors and hope workflow won't gets struck for an hour or two when similar incident happens in future.

I can't really test the timeout because of its minimum value being too high(7 days). Edited the MS doc to include this value. Hoping MS will make it flexible at least to an hour in future maybe.

For anyone hitting here with similar issue, we settled with a workaround of having a recurrence trigger workflow(twice a day) which will scan all workflows runtime and cancel the ones running more than a hour.

Ricky-G commented 1 year ago

PR has been submitted to make the documentation clear that the minimum for the setting is currently 7 days. A simple way to control workflow timeout for each logic-app is to have two flows from the start,

  1. Your actual flow that you currently have
  2. Alternate flow where you control the delay timer and terminate

See below for reference. image https://clouddev.blog/Azure/Logic-Apps/azure-logic-apps-timeout/#more

krisvijaykb commented 1 year ago

@Ricky-G,

Yeah. The documentation didn't have the 7 days limit. I have already merged an PR to add this to doc. Looks like you have moved the line little above.

Your approach will work but we kinda have lot of workflows running already. We decided it's better to have a monitoring workflow which scans and cancels any workflow running more than the desired time limit.