Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.92k stars 441 forks source link

ServiceBus trigger with managed identity doesn't wake up consumption plan function apps when Connection isn't explicitly defined #8261

Open Archomeda opened 2 years ago

Archomeda commented 2 years ago

This is for in-process v4 function apps, I'm unsure whether this also applies to the isolated variant.

When a Service Bus trigger is defined with a connection via managed identity, the function app doesn't get woken up by Azure when the Connection property is not defined in the trigger attribute, despite the trigger working fine when the function app is active.

Investigative information

Repro steps

  1. Create a function app with a service bus trigger via managed identity without specifying the Connection property
  2. Publish to Azure
  3. Wait for the function app to go inactive
  4. Post a message to the service bus queue

Expected behavior

The function app should get woken up and the service bus triggered function should get triggered. As far as I know, the Connection property was never mandatory for triggers that use connection strings, so this feels like an invisible breaking change when moving to managed identity based connections.

Actual behavior

The function app does not get woken up, and the service bus triggered function is not triggered.

Known workarounds

Define the Connection property that points to the name of the configuration (and gets resolved to <ConnectionName>__fullyQualifiedNamespace).

Related information

I've noticed that the generated function.json only includes the connection field if the Connection property is defined. This makes me believe that the default value "ServiceBus" is only considered in the functions runtime, and not in the Azure scale controller for function apps.

Source ```cs [FunctionName("ServiceBusTest")] public Task Run([ServiceBusTrigger("queue-name")] QueueMessage message) { // This doesn't get triggered if the function app is inactive } ``` The second trigger generates a function.json file **with** the connection property, because it's now explicitly defined: ```cs [FunctionName("ServiceBusTest2")] public Task Run([ServiceBusTrigger("queue-name", Connection = "ServiceBus")] QueueMessage message) { // This does get triggered if the function app is inactive } ```
v-bbalaiagar commented 2 years ago

Hi @Archomeda , Thank you for your feedback! We will check for the possibilities internally and update you with the findings.

Xodust commented 2 years ago

I am experiencing the same issue, but the work around does not work for me. Any word on a resolution to the issue?

fabiocav commented 2 years ago

@TsuyoshiUshio would you be able to validate/investigate this use case?

@mattchenderson for awareness/comments

fabiocav commented 2 years ago

Reached out to the Scale Controller team for investigation.

TsuyoshiUshio commented 2 years ago

Unfortunately, Scale Controller Default with SeviceBus doesn't support. However, we support AzureWebJobsServiceBus. However, the V5 extension looks support ServiceBus for default value. https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/servicebus/Microsoft.Azure.WebJobs.Extensions.ServiceBus/src/Config/ServiceBusClientFactory.cs#L61 We are working on Scale Controller improvement; however, this fix won't come soon. So that, as mitigation, could you please not use default configuration for now.

boylec commented 2 years ago

Sorry I'm not following with this response from @TsuyoshiUshio

To be clear, how are consumers of Azure Service Bus that want to use Managed Identity supposed to bind their functions?

Are you saying that we can't use ServiceBus__fullyQualifiedNamespace as an app setting and expect functions that use the ServiceBusTrigger to work as designed - instead we must use a different prefix than "ServiceBus"?

Also I think @TsuyoshiUshio's response may be addressing a separate issue having to do with configuration.

I am using "ServiceBusConnection__fullyQualifiedNamespace" and my function does read from service bus queues at least initially.

But once my function idles and JobHost goes to sleep (because I'm on Consumption plan) the next message in the queue does not wake the JobHost back up.

This is not related to the name of the app setting.

When I was opening an Azure Support ticket I saw a suggestion in the "recommendations" step before actually creating the ticket that once a JobHost falls asleep (under consumption plan), it won't wake again until an HTTP request is received from the ScaleController.

Seems this may not be happening in the case of ServiceBusTrigger functions? (No HTTP request from the ScaleController to wake the JobHost back up?)

boylec commented 2 years ago

I believe I have resolved my issue. Details below.

The scale controller (which us end users have no control over from what I can tell) is responsible for waking the JobHost up to consume messages when a service bus queue has any messages.

  1. I've added the Azure Service Bus Data Owner role (along w/ Azure Service Bus Data Reader) to my function app's managed identity (which it didn't have before). I am hypothesizing that the scale controller component runs under this managed identity and needs visibility into the queue length to determine whether to spin the function app up or not. Queue length cannot be looked at without the Azure Service Bus Data Owner role. Can anyone confirm that this is how the scale controller works? I'm not 100% on if this is part of what resolved my situation but I did add this role to my function app.
  2. This one I'm sure about. My binding for the queue name on my ServiceBusTrigger uses a hierarchical config setting.
    public async Task Run([ServiceBusTrigger("%ServiceBus:QueueName:Email%", Connection = "ServiceBusConnection")] string queueMessage)

    but my app settings use different hierarchical separators:

    ServiceBus__QueueName__Email (no colons)

    The scale controller was trying to resolve the queue name by scanning the ServiceBusTrigger attribute in my code (via assembly scanning I'm guessing) and seeing "ServiceBus:QueueName:Email" in that binding, and then trying to resolve "ServiceBus:QueueName:Email" in my app settings but since my app settings use different separators the latter fails.

Long story short: Using consistent hierarchical config separators for both app settings and ServiceBusTrigger binding allows the scale controller to resolve the queue name and start waking up the JobHost again to consume messages.

Super helpful note if you're experiencing issues with your function app waking up for ServiceBusTrigger functions.

Scale controller logging can be enabled using the following as an app setting for the function app. See here for more info on configuring scale controller logging.

SCALE_CONTROLLER_LOGGING_ENABLED=AppInsights:Verbose

After that, you can see these emitted logs by opening log analytics for your app insights instance w/ this query (making sure the scope of your log query is the app insights instance connected to your function app)

traces
| where customDimensions.Category == "ScaleControllerLogs"

This might reveal exceptions happening when the scale controller is trying to parse out your ServiceBusTrigger function binding. If it can't parse your binding it won't know what queue to look at in order to decide whether to wake your JobHost up to consume received messages.

Could be solution for #7762 as well

hbrotan commented 2 years ago

@boylec I'm experiencing the same issue as you:

my function idles and JobHost goes to sleep (because I'm on Consumption plan) the next message in the queue does not wake the JobHost back

I'm trying to use an User Assigned Managed Identity, with connection string: "ServiceBusConnection__fullyQualifiedNamespace"

The scale controller logs: [ManagedIdentity] Created NamespaceManager with ManagedIdentity [ManagedIdentity] Created QueueClient with ManagedIdentity This seems legit.

But the problem still exists -> The messages received when the function is idle will not be processed.

According to the docs (Azure WebJobs Service Bus client library for .NET) this should be straightforward, but there's no mention of scale controller, roles etc.

Can anyone shed a light on this?

boylec commented 2 years ago

@hbrotan I use a system assigned managed identity so not 100% sure how it works with UAMI.

My guess: You most likely need to assign the Azure Service Bus Data Owner and Azure Service Bus Data Reader roles. Have you tried that?

Azure Service Bus Data Owner allows for queue operations, and status reads. Azure Service Bus Data Reader allows the entity to actually consume messages from the queue. Thats my understanding.

hbrotan commented 2 years ago

@boylec Still doesn't work for me, using User-Assigned Managed Identity. The MI has Azure Service Bus Data Receiver for the queue it's suppose to read from, and Azure Service Bus Data Owner for the service bus, so that should be sufficient.

hbrotan commented 2 years ago

@fabiocav It would be really useful to get a comment from MS about this issue. A function not picking up messages from ServiceBus after going idle is a big showstopper for us.

brettsam commented 2 years ago

@TsuyoshiUshio -- can you summarize what a customer needs to do to get Scale Controller working with Managed Identity?

@alrod for visibility as well, in case there's anything in the SB extension that can be addressed.

hbrotan commented 2 years ago

@boylec It seems to be working now, using User-assigned Managed Identity 🙏
I haven't done any changes, so a bit unsure why it didn't work three days ago, but I'll take it.

Seems like the solution was like you said: Adding the Azure Service Bus Data Owner for the Managed Identity (I added it on the namespace /root level). Now, messages are consumed after the function goes idle. It takes about a minute before the message is consumed, which seems a bit long, but I guess this is a known issue for consumption plans.

NB! It did not work when adding the Azure Service Bus Data Owner for the Managed Identity on the queue itself, it needed to be on the Service Bus level.

@brettsam These docs vaguely describes the role requirements, but no mention of the scale controller or why/where the Data Owner role is required. Also, there's no mentioning of roles or required config for User-assigned Managed Identity in the SB extension docs.

hbrotan commented 2 years ago

I've tried to summarize my experiences (and this GitHub issue) in a blog post. Might be of use to someone stumbling into this issue.

fabiocav commented 2 years ago

@mattchenderson there's an opportunity for a docs update here. Can we track this work to make sure the role requirements are covered?

Thanks!

fabiocav commented 2 years ago

@TsuyoshiUshio any other enhancements we should be considering so we can make this easier?

TsuyoshiUshio commented 1 year ago

Hi @fabiocav I have a PR for the internal repo and work item for tracking this work item. Once the PR has been merged and new Scale Controller has been released that includes the fix, I'll let you guys know.

slabarque commented 1 year ago

I have the same issue, I think. My issue is described here #8991 . I have followed the advice given here and in @hbrotan 's blogpost. What I have tried:

But my function, when idle, still does not get triggered by new messages in the servicebus. I have also enabled the scale controller logs but I see no logs. The "Functions that are not triggering" detector in "Diagnose and solve problems" shows this: image

I have created a MS support ticket for this, the support engineer will contact the product team, so @TsuyoshiUshio , this might land in your inbox :-)

jeroenmaes commented 1 year ago

I was able to resolve the issue of @slabarque by hardcoding the topic and subscription name as mentioned in this issue: https://github.com/Azure/azure-functions-host/issues/7762#issuecomment-958713357

liliankasem commented 1 year ago

@TsuyoshiUshio any updates on the fix for this?

mdsharpe commented 9 months ago

Experiencing similar issue with Cosmos trigger. This seems to be a complete blocker for Consumption plan. Any progress?

mr-davidc commented 8 months ago

I seem to be running into this issue as well. I've tried pretty much everything in this issue to try and resolve it and still no avail. New messages on the service bus topic simply don't wake up to functions host.

I've checked the UAMI I am using and it has the required permissions and I've got all the right app settings configured.

I tried enabling the Scale Controller Logs but the query returns no results when I try to find the logs...

Is there anything else I could be missing to get this going ?

3bdNKocY commented 7 months ago

Adding my voice here as well, with very similar issues as the previous commenters.

As long as I keep the function alive it keeps on processing messages. Once the function goes to sleep it seems like the Scale Controller can't / won't / doesn't wake up my AF. This is blocking our migration away from AFv1 into AFv4.

mayank1495 commented 3 months ago

I faced a similar issue, adding my resolution here if it helps anyone:

To ensure proper functioning of Service Bus and related triggers on the consumption plan with User Assigned Managed Identity, please also add the following configurations along with __fullyQualifiedNamespace to the app settings of the function app. Without these settings, the function app may not trigger correctly after being idle, as the ScaleController won't be able to authenticate using the UA Managed Identity.

<CONNECTION_NAME_PREFIX>__credential="managedidentity" <CONNECTION_NAME_PREFIX>__clientId="<Identity_CLIENT_ID>"

Additionally, assign the necessary roles for the Service Bus to your MI:

If you add a role at the topic level, you might also need to add it at the subscription level using the Azure CLI, as the portal does not support adding roles to subscriptions. Please refer to the documentation for detailed instructions. Doc - Guidance for developing Azure Functions | Microsoft Learn

I have added these settings to my function app, which was facing the issue, and now it seems to be working fine.

black-snow commented 3 months ago

This is over a year old now ... I don't get it to work. I use a managed identity, I've assigned Data Receiver as well as Owner but the function just won't get triggered. I followed this. How the hell do I get a service bus queue to trigger a consumption plan linux function app?

samuel-kogler-AP commented 2 months ago

@black-snow I have had this same issue, switched to a Windows consumption plan due to issues with managed identity on linux

black-snow commented 2 months ago

Interesting. I completely ditched Functions instead ...

ffarinha-msft commented 4 weeks ago

I have came across this same issue.

Assuming you have followed the documentation and you have given all the necessary permission for the function app MSI to access the service bus.

There is one important detail to be aware of... You need to pass only the service bus hostname/namespace like this "MY_SERVICEBUS.servicebus.windows.net".

If you pass https:// MY_SERVICEBUS.servicebus.windows.net it will not work