Azure / azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
MIT License
714 stars 270 forks source link

Mismatch in orchestrator data in Azure Storage & Azure Portal #2000

Open adeliab opened 2 years ago

adeliab commented 2 years ago

Description

I've been having difficulty matching orchestrator data in Azure Storage & Azure Portal. Some examples

The log analytics query I use is the one generated when you go to function app > function > monitor > orchestrations > run query in application insights

Expected behavior

I can find matching orchestrators in azure storage & azure portal. Failed orchestrators are displayed as failed

Actual behavior

I can't find orchestrator in azure portal. Failed status is not found

Relevant source code snippets

N/A

Known workarounds

None

App Details

Screenshots

example running function not found azure storage image azure portal image

If deployed to Azure

We have access to a lot of telemetry that can help with investigations. Please provide as much of the following information as you can to help us investigate!

If you don't want to share your Function App or storage account name GitHub, please at least share the orchestration instance ID. Otherwise it's extremely difficult to look up information.

cgillum commented 2 years ago

@adeliab can you share the full query, including the name of the collection you're querying. It's okay to block out any info you don't want shared, but it would be helpful to see the full structure of your query including the data source.

@anthonychu or @davidmrdavid can you assist?

adeliab commented 2 years ago

@cgillum this is the query I used. basically what this button generates, and i added the extra where clause to filter on instance ID and change the time range in the log analytics image

requests
| extend
    DurableFunctionsInstanceId = tostring(customDimensions['DurableFunctionsInstanceId']),
    DurableFunctionsRuntimeStatus = tostring(customDimensions['DurableFunctionsRuntimeStatus']),
    DurableFunctionsType = tostring(customDimensions['DurableFunctionsType'])
| where DurableFunctionsType == 'Orchestrator'
| project
    timestamp,
    id,
    name,
    operation_Name,
    cloud_RoleName,
    DurableFunctionsRuntimeStatus,
    DurableFunctionsType,
    DurableFunctionsInstanceId
| where DurableFunctionsRuntimeStatus != 'Terminated' and name == 'xxxxx'
| union (requests
    | extend
        DurableFunctionsInstanceId = tostring(customDimensions['DurableFunctionsInstanceId']),
        DurableFunctionsRuntimeStatus = tostring(customDimensions['DurableFunctionsRuntimeStatus']),
        DurableFunctionsType = tostring(customDimensions['DurableFunctionsType'])
    | where DurableFunctionsType == 'Orchestrator'
    | where DurableFunctionsRuntimeStatus == 'Terminated' and DurableFunctionsInstanceId in ((requests
        | extend
            DurableFunctionsInstanceId = tostring(customDimensions['DurableFunctionsInstanceId']),
            DurableFunctionsRuntimeStatus = tostring(customDimensions['DurableFunctionsRuntimeStatus']),
            DurableFunctionsType = tostring(customDimensions['DurableFunctionsType'])
        | where DurableFunctionsType == 'Orchestrator'
        | where DurableFunctionsRuntimeStatus != 'Terminated' and name == 'xxxxx'
        | distinct DurableFunctionsInstanceId)))
| where cloud_RoleName =~ 'xxxxx' and operation_Name =~ 'xxxxx'
| where DurableFunctionsInstanceId == '3b7910c5224749559b7557c7bb09aa0d'
| summarize arg_max(timestamp, *) by DurableFunctionsInstanceId
| order by timestamp desc
| take 20
davidmrdavid commented 2 years ago

Thanks for reaching out, @adeliab. I'll be trying to reproduce this tomorrow morning, I'll look to provide an update shortly after!

davidmrdavid commented 2 years ago

Just a quick update here. I was experimenting with this and it doesn't seem like the DF custom tags are being sent to Application Insights, which is very strange.

I'll need to sync with @anthonychu as I recall doing some work on this with him. I'll continue updating this thread as we learn more about what's going on here.

adeliab commented 2 years ago

@davidmrdavid any update on this issue?

davidmrdavid commented 2 years ago

Hi @adeliab,

Apologies for the late response, we were having a hard time reproducing this reliably but now we know that's because this feature appears to work with some (older) versions of the extension, but not with newer ones.

I'm able to reproduce this now, but it'll take some time to fix because this issue likely requires some collaboration with the Application Insights folks to understand why this isn't working anymore. I'll get back to you as soon as we have more updates.

adeliab commented 2 years ago

hi @davidmrdavid any update on this issue?

jense789 commented 8 months ago

hi @davidmrdavid!

It seems like I have a similar issue this issue two years later. I ran a Python DF, which threw an exception in the first activity.

I am seeing the orchestration run's results in LogAnalytics and thus in Monitor/Orchestration Traces in the portal, but the status differs from what the DF status endpoint itself says (which is correct):

https://---.azurewebsites.net/runtime/webhooks/durabletask/instances/8f25ff754b3e468ca3544731def240cf?taskHub=---&connection=Storage&code=---

says

{
"name": "checksynapse",
"instanceId": "8f25ff754b3e468ca3544731def240cf",
"runtimeStatus": "Failed",
"input": null,
...
}

while

requests
| extend
    DurableFunctionsInstanceId = tostring(customDimensions['DurableFunctionsInstanceId']),
    DurableFunctionsRuntimeStatus = tostring(customDimensions['DurableFunctionsRuntimeStatus']),
    DurableFunctionsType = tostring(customDimensions['DurableFunctionsType'])
| where DurableFunctionsType == 'Orchestrator'
| where DurableFunctionsInstanceId == '8f25ff754b3e468ca3544731def240cf'

returns only 1 row saying DurableFunctionsRuntimeStatus == "Running" Orchestration Traces -> Monitoring is wrong, too, because it uses this data.

It seems LogAnalytics didn't receive the final "Terminated" event?

Via Storage Browser on the ...Instances table, it correctly shows RuntimeStatus = "Failed" for partitionkey = "8f25ff754b3e468ca3544731def240cf"

(I have a faint idea that this inconsistency might have been coming from me re-publishing the function app before some lagging polling process issued the Terminate event to LogAnalytics... )

Regards Jens