Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
714 stars 270 forks source link

Add documentation for managing orchestrations/entities from external clients #1600

Open wynandjordaan opened 3 years ago

wynandjordaan commented 3 years ago

Description

We have run into an issue where Durable Functions cannot be run in Kubernetes when deployed. I am using the "func kubernetes deploy" command to generate the YAML file for deployment. By default the HTTP and non-HTTP functions are split up into two deployments or services. This creates a function whitelist in ENV variables like:

env:
- name: AzureFunctionsJobHost__functions__0
value: GetWorkflowStatus_V1
- name: AzureFunctionsJobHost__functions__1
value: StartWorkflow_V1
- name: AzureFunctionsJobHost__functions__2
value: TerminateWorkflow_V1
- name: AzureFunctionsJobHost__functions__3
value: TriggerWorkflow_V1

Expected behavior

I expect that the HTTP functions should be able to run separate from the Orchestrator and Activities.

Actual behavior

Currently when deployed to Kubernetes we receive an error, when calling the StartNewAsync method, that the Orchestrator Function cannot be found or is disabled.

Error: "The function 'RoadsideBatteryJobWorkflow' doesn't exist, is disabled, or is not an orchestrator function. Additional info: No orchestrator functions are currently registered!" "Asgard.Odin.Workflows.RoadsideBatteryJob.Triggers.StartJobWorkflow" System.ArgumentException: The function 'RoadsideBatteryJobWorkflow' doesn't exist, is disabled, or is not an orchestrator function. Additional info: No orchestrator functions are currently registered! at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableTaskExtension.ThrowIfFunctionDoesNotExist(String name, FunctionType functionType) in D:\a\r1\a\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\DurableTaskExtension.cs:line 1062 at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableClient.Microsoft.Azure.WebJobs.Extensions.DurableTask.IDurableOrchestrationClient.StartNewAsync[T](String orchestratorFunctionName, String instanceId, T input) in D:\a\r1\a\azure-functions-durable-extension\src\WebJobs.Extensions.DurableTask\ContextImplementations\DurableClient.cs:line 140 at Asgard.Odin.Workflows.RoadsideBatteryJob.Triggers.StartJobWorkflow.Run(HttpRequestMessage request, String instanceId, IDurableClient orchestrationClient) in /src/dotnet-function-app/Asgard.Odin.Workflows.RoadsideBatteryJob/Triggers/StartJobWorkflow.cs:line 59

Known workarounds

A workaround is to add the Orchestrator function in the whitelist on the HTTP functions deployment. This is not a great solution as any new deployments will have to be manually edited to be able to run.

If deployed to Azure

We have access to a lot of telemetry that can help with investigations. Please provide as much of the following information as you can to help us investigate!

ConnorMcMahon commented 3 years ago

We have two to three different ways to invoke orchestrations on a function app from an external function app acting as a client.

Note that both of these methods assume that the two functions apps have the Durable Extension installed, and have a different combination of TaskHubName + storage accounts. If the two apps are sharing a TaskHubName and storage account, you are likely to run into large problems even if you start the orchestrations.

Easiest Approach: Use Durable Client bindings

I'm assuming this is the approach you have already tried. Note that by default, the IDurableClient will try to create orchestrations on it's own task hub and storage account. In order to send orchestrations to an external task hub, you must provide a different TaskHub and/or StorageConnectionName value. These values can be provided on the binding attribute, or some of our APIs allow these values to be passed directly to them.

If you provide a different TaskHub and/or StorageConnecionName value, we should bypass the check for whether your application has this function itself. If this is not the case, this is a bug that we need to fix.

Next easiest approach: Use new IDurableClientFactory service

As a part of v2.4.0, we now allow customers to add the below code to their Startup.cs configure method:

// AddDurableClientFactory() registers IDurableClientFactory as a service so the application
// can consume it and and call the Durable Client APIs
services.AddDurableClientFactory();

Then, if you use DI to inject this service into your HTTP client function, you can use code like the below to generate a client that is designed to be used by external applications.

clientFactory.CreateClient(new DurableClientOptions
            {
                ConnectionName = "Storage",
                TaskHub = configuration["TaskHub"]
            });

The advantage to this approach is you can use this for any .NET Core application, not just a function app. If your client app is only using HTTP triggers, then you may be able to make it a ASP.NET Core app.

Hardest approach: Directly call HTTP instance managment APIS

I would strongly recommend against this approach, but your function app that is acting as the "server" in this case does expose HTTP APIs that could be called directly in order to start orchestrations.

I hope one of those options works for you, but please let us know if you try the first and that does not work, as that would be a bug we want to fix.

ConnorMcMahon commented 3 years ago

Honestly, the above commentary I added should probably be official documentation somewhere.

wynandjordaan commented 3 years ago

Hi Connor,

Thanks for the reply. I will explore the IDurableClientFactory approach.

However just to give more details what the func cli tool does. We have one project in Visual Studio, containing the Http functions, Durable Orchestrator and the activities. This means that the Storage account and TaskHub is the same for the Http functions and the Orchestrator.

The func cli deploys this into to two containers. One container has the Activities and Orchestrator functions enabled and the other just have the Http functions enabled.

So a possible solution would be to create a IDurableClientFactory like below:

var client = _durableClientFactory.CreateClient(new DurableClientOptions { ConnectionName = "AzureWebJobsStorage", TaskHub = orchestrationClient.TaskHubName, //IsExternalClient = false? });

And then use that to start a new instance with.

Do you foresee any issues with this? Even if this is using the same TaskHub and Connection.

wynandjordaan commented 3 years ago

An easier fix is that if I specify that the IDurableClient is an ExternalClient it also bypasses the check.

Change: [DurableClient] IDurableClient orchestrationClient to: [DurableClient(ExternalClient = true)] IDurableClient

Do you think that this will cause other issues?

ConnorMcMahon commented 3 years ago

@wynandjordaan

I think setting ExternalClient=true also should work without causing issues.

However, the shared TaskHubName + StorageConnectionString is going to be a problem for your scenario. Note that from here on out, I am going to refer to the combination of these two things as a TaskHub.

In general, when a Function app has a dependency on the Microsoft.Azure.WebJobs.Extensions.DurableTask nuget package, it will load in the Durable Functions extension. The Durable Functions extension will spin up a TaskHubWorker, which will be polling for work items from the activity and orchestration queues on that TaskHub. If both of your function apps share a TaskHub, then both function apps will be polling the TaskHub for the same work items.

This is where you will run into problems. Your "client" function app will receive work items for orchestrations and activities that it has no active functions for, resulting in bad behavior. Specifically, I think those "activities" and "orchestrations" will fail from the Durable Task Framework perspective, without even invoking any of your code!

There are two fairly straightforward workarounds for this:

  1. Make sure your function apps have different AzureStorageWebJobs app settings/environment variables. This will make the taskhubworkers look in separate storage accounts, so there won't be this conflict.
  2. In a similar fashion to approach one, but use ConnectionName to specify the app setting or environment variable that contains your storage connection strings instead of relying on the default, and rely on different values for your different apps.

That being said, if you want two separate function apps with different sets of functions enable for each, I would strongly recommend keeping them as separate applications. It will make things far more maintainable in the long run. If you want to share code between the two apps, package that as a library consumed by both. Functions themselves at this time cannot be packaged as libraries, so if you want shared functions, you can use compilation techniques like what we use in our tests to make sure that Functions get compiled as a part of both projects.

wynandjordaan commented 3 years ago

Hi Connor

I just saw that one of our Durable functions have been running without throwing the exception. However that is using version 2.2.2 of the DurableTask.

My question is that does this version have the same issue as 2.4.1 with regards to the TaskHub combination. Or did the internals change and 2.2.2 will not run into the same issue as above?

I have also logged the issue with the function core tools team as I think they must provide a way to bypass the splitting.

Thanks Wynand

aashish004 commented 3 years ago

@ConnorMcMahon How will this work for JS and Python based functions? Can you please point me to some documentation

ConnorMcMahon commented 3 years ago

@wynandjordaan,

To clarify, the TaskHub + storage account combination is an intentional design decision that has been around since the beginning of Durable Functions. Essentially, if two apps share this combination, they will both be looking at the same storage resources to decide when to execute functions.

We are looking into options to allow function applications to load in the Durable Functions extension to provide access the client functionality, without spinning a full blown task hub worker, meaning it won't actually try to execute functions scheduled on that TaskHub. This is likely a v2.5.0 or later work item.

EDIT: It turns out I was wrong about runtime issues having a client app and a server app share a storage account and taskhub assuming the client app has no Durable Functions specific triggers. See #1629 for more discussion.

ConnorMcMahon commented 3 years ago

@aashish004,

JS and Python should work the same way, but instead of controlling the Durable Client binding with C# attributes, they are controlled via the function.json. If you want to have a client function app, my recommendation would be to have your client app use a different dummy TaskHub name, and then reference the real task hub name in your binding json.

aashish004 commented 3 years ago

@aashish004,

JS and Python should work the same way, but instead of controlling the Durable Client binding with C# attributes, they are controlled via the function.json. If you want to have a client function app, my recommendation would be to have your client app use a different dummy TaskHub name, and then reference the real task hub name in your binding json.

@ConnorMcMahon Thank you for the response. I figured that for non compiled/ Interpreted languages. But I guess even with separate storage for client function and TaskHub It still doesn't work I don't know If it is an open bug or not. I had to put all functions in one container to make it work.

env:
        - name: AzureFunctionsJobHost__functions__0
          value: HelloHttp
        - name: AzureFunctionsJobHost__functions__1
          value: HOrchestrator
        - name: AzureFunctionsJobHost__functions__2
          value: HelloAcitvity
        - name: AzureWebJobsSecretStorageType
          value: kubernetes
        - name: AzureWebJobsKubernetesSecretName
          value: secrets/func-keys-kube-secret-hello-workflow

I tried CPU scaler and seems to work and everything is also working out fine.

I have few questions though. Do you see any issue with this infra setup? Also Is it possible to have different storage like redis etc. for durable functions?

ConnorMcMahon commented 3 years ago

I believe it should be fine @aashish004 as long as your client function app doesn't have any Durable Functions triggers (activites/orchestrations/entities). If it doesn't, we actually won't spin up a TaskHubWorker so the client app won't try to process messages on your task hub.

As for alternative backends, we have a couple coming down the pipeline. The Redis provider is in early alpha, and is unlikely to get much attention in the near future due to some technical challenges that would be difficult to overcome. We have one in public preview that is based on EventHubs + Azure Storage that is designed for high performance scenarios. In the near future, expect one using Sql Server so you can run Durable Functions without any dependencies on Azure Storage.

aashish004 commented 3 years ago

I believe it should be fine @aashish004 as long as your client function app doesn't have any Durable Functions triggers (activites/orchestrations/entities). If it doesn't, we actually won't spin up a TaskHubWorker so the client app won't try to process messages on your task hub.

As for alternative backends, we have a couple coming down the pipeline. The Redis provider is in early alpha, and is unlikely to get much attention in the near future due to some technical challenges that would be difficult to overcome. We have one in public preview that is based on EventHubs + Azure Storage that is designed for high performance scenarios. In the near future, expect one using Sql Server so you can run Durable Functions without any dependencies on Azure Storage.

I have client function with an HTTP Trigger.

{
  "bindings": [
    {
      "authLevel": "anonymous",
      "name": "req",
      "type": "httpTrigger",
      "direction": "in",
      "route": "orchestrators/{functionName}",
      "methods": [
        "post",
        "get"
      ]
    },
    {
      "name": "$return",
      "type": "http",
      "direction": "out"
    },
    {
      "name": "starter",
      "type": "durableClient",
      "direction": "in"
    }
  ]
}

I get the following error on trying to invoke an orchestrator function:

---> Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcException: Result: Failure Exception: Error: The operation failed with an unexpected status code: 400. Details: {"Message":"One or more of the arguments submitted is incorrect","ExceptionMessage":"The function 'hello-orchestrator' doesn't exist, is disabled, or is not an orchestrator function. Additional info: No orchestrator functions are currently registered!"

marcd123 commented 3 years ago

For those working on Non C# Durable Function Apps, such as Python or JavaScript function apps, here is an example of using the externalClient setting in your HTTP-starter's function.json to enable your HTTP and non-HTTP deployments to reach each other:

https://github.com/microsoft/durabletask-mssql/issues/41#issuecomment-885299181

{
  "bindings": [
    {
      "authLevel": "anonymous",
      "name": "req",
      "type": "httpTrigger",
      "direction": "in",
      "route": "orchestrators/{functionName}",
      "methods": ["post"]
    },
    {
      "name": "$return",
      "type": "http",
      "direction": "out"
    },
    {
      "name": "starter",
      "type": "durableClient",
      "direction": "in",
      "externalClient": true
    }
  ]
}

Setting externalClient to true in your HTTP-starter's function.json will disable the local check for the orchestrator you are trying to trigger, and will allow the requested orchestrator to be scheduled.

cgillum commented 2 years ago

@bhugot would you mind opening a new issue for the inconsistency that you've found? That will make it easier for the team to independently track and fix.

bhugot commented 2 years ago

@cgillum done #2045