Azure / logicapps

Azure Logic Apps labs, samples, and tools
MIT License
367 stars 302 forks source link

Unexpected number of runs for trigger "When one or more messages arrive in a queue (peek-lock)" #860

Closed MrRosendahl closed 10 months ago

MrRosendahl commented 1 year ago

Describe the Bug with repro steps

  1. Create a new Stateful workflow and use the Service Bus trigger (Not the in app version) for either "When one or more messages arrive in a queue (peek-lock)" or "When one or more messages arrive in a topic (peek-lock)", the bug is the same for both.
  2. Leave default settings, i.e. split on = true and concurrency = off.
  3. Use Managed Identity (system assigned) to connect to the service bus.
  4. Set the maximum message count to a low value, in this case 2.
  5. Set the recurrence frequency to day and interval to 10 (high enough so it does not mess with the test.
  6. Create an action which completes the message.
  7. Add 10 messages to the queue.
  8. Run the trigger (twice... since the first one often does not fire in the portal)

Before running trigger Message count in the queue equals: 10 Max deliver count: 2

Expected result Number of triggers fired: 1. Number of history runs: 2. Message count in the queue after the run should be: 8.

Actual result Number of triggers fired: 16. Number of history runs: 10. Message count in the queue after the run equals: 0.

The bug is the same for: "When one or more messages arrive in a queue (peek-lock)" "When one or more messages arrive in a topic (peek-lock)". My colleagues have the same problem.

This used to work when using Logic Apps Comsumption.

What type of Logic App Is this happening in?

Standard (Portal)

Are you using new designer or old designer

New Designer

Did you refer to the TSG before filing this issue? https://aka.ms/lauxtsg

Yes

Workflow JSON

{
    "definition": {
        "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
        "actions": {
            "Complete_the_message_in_a_queue": {
                "inputs": {
                    "host": {
                        "connection": {
                            "referenceName": "servicebus-2"
                        }
                    },
                    "method": "delete",
                    "path": "/@{encodeURIComponent(encodeURIComponent(parameters('serviceBusQueueName')))}/messages/complete",
                    "queries": {
                        "lockToken": "@{triggerBody()?['LockToken']}",
                        "queueType": "Main",
                        "sessionId": ""
                    }
                },
                "runAfter": {},
                "type": "ApiConnection"
            }
        },
        "contentVersion": "1.0.0.0",
        "outputs": {},
        "triggers": {
            "When_one_or_more_messages_arrive_in_a_queue_(peek-lock)": {
                "inputs": {
                    "host": {
                        "connection": {
                            "referenceName": "servicebus-2"
                        }
                    },
                    "method": "get",
                    "path": "/@{encodeURIComponent(encodeURIComponent(parameters('serviceBusQueueName')))}/messages/batch/head/peek",
                    "queries": {
                        "maxMessageCount": 2,
                        "queueType": "Main",
                        "sessionId": "None"
                    }
                },
                "recurrence": {
                    "frequency": "Day",
                    "interval": 10
                },
                "splitOn": "@triggerBody()",
                "type": "ApiConnection"
            }
        }
    },
    "kind": "Stateful"
}

Screenshots or Videos

Queue Settings QueueSettings QueueProperties

Queue Before Run BeforeRunningTrigger_ServiceBusExplorer

Workflow Trigger Before Manual Run BeforeRunningTrigger_LogicAppTriggerHistory BeforeRunningTrigger_LogicAppRunHistory

Workflow Trigger After Manual Run AfterRunningTrigger_LogicAppTriggerHistory AfterRunningTrigger_LogicAppRunHistory

Queue After Run AfterRunningTrigger_ServiceBusExplorer_Expected_8_Left

Browser

Microsoft Edge

Additional context

No response

AB#24933340

AB#24937486

rllyy97 commented 1 year ago

Hi @MrRosendahl, thanks for reporting your issue in detail, this appears to be a backend bug so I'm going to move this over to the repo that our backend devs keep an eye on for triage

MrRosendahl commented 1 year ago

@rllyy97 do the backend team have an internal ticketing system, or is there any way to get an update or a reference to that ticket? The university (Lulea University of Technology) I'm working at really needs to know the status and how soon this can be fixed in our environment(s).

If it makes things easier I can pm you my work e-mail.

MrRosendahl commented 1 year ago

@rllyy97 I've tried to create a "Support request" inside Azure on our subscription, but it looks like that's only possible when you get an exception in the logic app. Since this is a bug we want it connected to our SLA.

rllyy97 commented 1 year ago

@mrRosendahl, I'm not aware of that requirement to have an exception in the logic app, but here's the docs on creating a support request if you haven't seen it already: https://learn.microsoft.com/en-us/azure/azure-portal/supportability/how-to-create-azure-support-request, I'll ping some backend folks as well to see if they can give some info for you

apranaseth commented 1 year ago

@MrRosendahl this is by design. When there are messages received by the service bus trigger, the trigger tries to get more messages immediately for high throughput scenarios. If it doesn't find any messages, the next recurrence interval will be honored and trigger will check more messages only after the recurrence interval. There are more number of triggers triggered as there is split-on enabled on the trigger.

MrRosendahl commented 1 year ago

@apranaseth okey. What do you do for scenarios where you actually want the trigger to do what the settings tell it to do? There is no settings, to my knowledge, to disable the "high throughput", i.e. "fetch more messages if you find any" , or is there?

We don't want to use up all cpu in our environment... We're talking of a huge number of messages, and we want to control the data throughput. We want to be able to rely on the settings being used, and not having some default /"by design" override it. Btw, this "by design" works different from the consumption one.. That one used the settings.

The workaround for this is to use a recurrance trigger and a fetch from the topic via the service bus action since that does what it is set to do and nothing else... But we would rather want to use the trigger and be able to follow each message as its own run.

apranaseth commented 1 year ago

This behavior should be same for consumption or Logic Apps standard as that's the default behavior of the managed service bus connector. Could you elaborate what worked in consumption differently? I would also recommend opening a CSS case and share the workflow details along with the behavior for the team to look at.

yoHasse commented 1 year ago

@apranaseth I agree with @MrRosendahl that the design of the setting is confusing.

Explanation:

1. Settings:

2. Behavior:

With the above settings:

Simplified Example:

Imagine you have a mailbox that receives letters. This mailbox is special because it has a timer and opens every 10 minutes to let you take letters.

Your settings are:

Every 10 minutes:

  1. The mailbox opens.
  2. You take 2 letters.
  3. You place each letter in its own tray (equivalent to the splitOn feature).

If you were to take all the letters in the mailbox every time it opened, then the rule of taking only 2 letters would be pointless. Similarly, if the trigger fetches all messages from the service bus, the settings I provided become meaningless.

MrRosendahl commented 1 year ago

@apranaseth When you say "the trigger tries to get more messages immediately for high throughput scenarios", that's part of the definition of the prefetchCount property in the SDK: "Gets the number of messages that will be eagerly requested from Queues or Subscriptions during processing. This is intended to help maximize throughput by allowing the processor to receive from a local cache rather than waiting on a service request.". https://learn.microsoft.com/en-us/dotnet/api/azure.messaging.servicebus.servicebusprocessor.prefetchcount?view=azure-dotnet#definition

When looking at Azure SDKs (which I assume is being used in the background by the trigger) the default value for prefetching is 0, i.e. no prefetching should take place. This means that the trigger overrides this value by design and this overridden value cannot be changed... which is wierd, its a property. https://learn.microsoft.com/en-us/dotnet/api/azure.messaging.servicebus.servicebusreceiver.prefetchcount?view=azure-dotnet

https://learn.microsoft.com/en-us/azure/service-bus-messaging/service-bus-performance-improvements?tabs=net-standard-sdk-2#prefetching

apranaseth commented 1 year ago

@MrRosendahl Its not the prefetch. It is the managed API connector design, the connector returns a retry after header to the Logic Apps, in case of service bus/ event hubs connectors if the trigger got messages that retry header is set to retry immediately to get more messages for high through put. This has been like this forever, I will work with our docs team to ensure we document this as well. FYI the managed API connector still uses the SBMP SDK from service bus.

MrRosendahl commented 1 year ago

@apranaseth Thank you for the clarification on the retry for high througput! 👍 I'm not sure that the logic for the service bus trigger settings is documented in regards to that the Azure Service Bus retry header overrides the trigger setting "maxMessageCount", and also with no trigger option available to set a service bus header "no retry", at least to my knowledge... it might exist somewhere out there 😄

💭 What's your thought on this with you or your teams knowledge of Azure Service Bus: how can we use a Azure Logic Apps Standard Service Bus Trigger to always run a fixed number of messages (or lower if there are less messages available) using an available Logic App Standard Service Bus Trigger?

apranaseth commented 1 year ago

There is a documentation gap on that, we will consider to improve it. To be precise, the maxMessageCOunt is not something which gets changed in this case, that is still honored. The connector will try to get messages <= maxMessageCount from the service bus, if it gets messages it will try to get more in the next trigger run. You should be able to use concurrency in this case, This is currently not supported for the built-in triggers, but can be used with managed API connection. As we speak we are currently working on providing the concurrency support in the built-in trigger as well.

MrRosendahl commented 1 year ago

Sadly the in-app trigger requires setting up a vnet... and permissions, which makes it not so useful/user friendly.

apranaseth commented 1 year ago

If you want to use the peek-lock trigger with complete that is what is required as of now, it is designed such that cause of the constraint from service bus SDK to use the same receiver to complete the message which has picked up the message.

MrRosendahl commented 1 year ago

[...] You should be able to use concurrency in this case, This is currently not supported for the built-in triggers, [...]. As we speak we are currently working on providing the concurrency support in the built-in trigger as well.

@apranaseth, is there any publicly available roadmap for upcoming/planned releases? So that we know when we can start using concurrency with this trigger.

MrRosendahl commented 12 months ago

@apranaseth any update on the concurrency on the trigger?

As we speak we are currently working on providing the concurrency support in the built-in trigger as well.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 45 days with no activity.

github-actions[bot] commented 10 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

MrRosendahl commented 9 months ago

Is this really done or did some automatic script mark this to Done?