Closed RDavis3000 closed 2 months ago
Thank you for your feedback. Tagging and routing to the team member best able to assist.
The issue appears to be that the Event Grid service is not delivering events to the function in a timely manner. @RDavis3000 can you confirm that the deadlettered events were never delivered to the function? If that is the case, then we would need to ask for assistance from the service team in troubleshooting this.
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jfggdl.
Author: | RDavis3000 |
---|---|
Assignees: | JoshLove-msft |
Labels: | `Event Grid`, `Service Attention`, `Client`, `customer-reported`, `question`, `needs-team-attention` |
Milestone: | - |
Hi @JoshLove-msft , thanks for picking this up.
I can find that some dead-lettered events were in fact delivered and processed correctly by my function:
DeadLetter json: [ { "id": "b7a981e2-91e4-44bb-8b6a-0746fff9b20e", "eventTime": "2022-08-11T07:39:18.3285837Z", "eventType": "InstanceManager.StepExecutionComplete", "dataVersion": "1.0", "metadataVersion": "1", "topic": "/subscriptions/814f37ac-6bb0-4108-9c29-72909f9eef1e/resourceGroups/rg-u-lxr-aue-ingestion/providers/Microsoft.EventGrid/topics/egt-u-lxr-aue-ingestion-doors", "subject": "InstanceManager.StepExecutionComplete", "deadLetterReason": "TimeToLiveExceeded", "deliveryAttempts": 2, "lastDeliveryOutcome": "TimedOut", "publishTime": "2022-08-11T07:39:18.2810770Z", "lastDeliveryAttemptTime": "2022-08-12T18:00:08.0839967Z", "data": { "StepId": "3f3a1188-828f-4007-a2f0-6ef9e2be8050", "InstanceId": "1fa3209d-a3df-4664-b427-06ff5b31e2a7", "Details": "Called by 3f3a1188-828f-4007-a2f0-6ef9e2be8050 because SELF CODE COMPLETE" } } ]
Processed at 2022-08-12T18:00:08.1721073Z Operation ID: ab1d67293751c72bcf45c4984ee2d52f
So that's weird. Given the somewhat messy nature of event duplication is it possible that two events were emitted and one was dead lettered and the other processed? Can that happen?
Here is another dead-letter record for an event which was actually delivered and processed:
[ { "id": "11168dd3-ba8c-4058-9f5b-c52535eca590", "eventTime": "2022-08-11T14:20:29.9733682Z", "eventType": "DOORS.ArtifactHeadersRemain", "dataVersion": "1.0", "metadataVersion": "1", "topic": "/subscriptions/814f37ac-6bb0-4108-9c29-72909f9eef1e/resourceGroups/rg-u-lxr-aue-ingestion/providers/Microsoft.EventGrid/topics/egt-u-lxr-aue-ingestion-doors", "subject": "DOORS.ArtifactHeadersRemain", "deadLetterReason": "MaxDeliveryAttemptsExceeded", "deliveryAttempts": 8, "lastDeliveryOutcome": "TimedOut", "publishTime": "2022-08-11T14:20:29.9683774Z", "lastDeliveryAttemptTime": "2022-08-11T15:24:49.0072095Z", "data": { "IngestionRequestInstanceId": "1fa3209d-a3df-4664-b427-06ff5b31e2a7", "SourceIngestionRequestInstanceStepId": "96124739-4179-4986-bed6-e7416ee534b2", "EventModel": { "ProjectId": "_pzbw0FU3Eeu0u82irxw9ng", "Id": 0, "SourceUri": "https://lxra.clm.ibmcloud.com/rm/publish/resources/*?modifiedSince=1901-01-01T00%3A00%3A00.000Z&scoped=true&size=100&projectURI=_pzbw0FU3Eeu0u82irxw9ng&token=_AieMkBk5Ee2ItfaVDUgCYA&page=75" } } } ] Kusto smoking gun:
union
traces
| where timestamp > datetime("2022-08-10T02:56:23.169Z") and timestamp < datetime("2022-08-17T02:56:23.169Z")
| where * has "Child"
and * has "instance"
and * has "step"
and * has "created"
and * has "IngestionRequestInstanceId:1fa3209d-a3df-4664-b427-06ff5b31e2a7"
and * has "URL:"
and * has "https://lxra.clm.ibmcloud.com/rm/publish/resources/*?modifiedSince=1901-01-01T00%3A00%3A00.000Z&scoped=true&size=100&projectURI=_pzbw0FU3Eeu0u82irxw9ng&token=_AieMkBk5Ee2ItfaVDUgCYA&page=75"
| order by timestamp desc
| take 100
For the whole operation:
union
*
| where operation_Id == '51e924a16055a7e45f50c6f53bf326ac'
A bit more digging shows this record looks like it was delivered 7 times. operationId 51e924a16055a7e45f50c6f53bf326ac processed the event, for the other 6 the my code detected the duplicates and ignored them.
let opids = traces
| where timestamp > datetime("2022-08-10T02:00:41.207Z") and timestamp < datetime("2022-08-17T02:00:41.207Z")
| where * has "projectURI=_pzbw0FU3Eeu0u82irxw9ng&token=_AieMkBk5Ee2ItfaVDUgCYA&page=75"
| distinct operation_Id
| project operation_Id;
union isfuzzy=true
requests,
exceptions,
traces
| where operation_Id in (opids)
| order by operation_Id, timestamp asc
But the majority have not been delivered, like this one with only one attempt in 24 hours:
{
"id": "2e669d53-2b92-4af4-8808-63df0a1451f9",
"eventTime": "2022-08-11T06:14:10.8365432Z",
"eventType": "InstanceManager.StepExecutionComplete",
"dataVersion": "1.0",
"metadataVersion": "1",
"topic": "/subscriptions/814f37ac-6bb0-4108-9c29-72909f9eef1e/resourceGroups/rg-u-lxr-aue-ingestion/providers/Microsoft.EventGrid/topics/egt-u-lxr-aue-ingestion-doors",
"subject": "InstanceManager.StepExecutionComplete",
"deadLetterReason": "TimeToLiveExceeded",
"deliveryAttempts": 1,
"lastDeliveryOutcome": "Probation",
"publishTime": "2022-08-11T06:14:10.8206787Z",
"lastDeliveryAttemptTime": "2022-08-11T06:14:10.8362971Z",
"data": {
"StepId": "c305c2fb-5782-484d-aca7-3fb2089744cd",
"InstanceId": "1fa3209d-a3df-4664-b427-06ff5b31e2a7",
"Details": "Called by 59eaf11b-94b7-42ea-85e0-670fad25c103 because SELF CODE COMPLETE *** Called by 59eaf11b-94b7-42ea-85e0-670fad25c103 because all finished ChildrenCount:0 HasChildren:False State:Finished *** Called by 8045ab1a-ec40-4bb5-bdc7-d802539edd60 because all finished ChildrenCount:2 HasChildren:True State:Finished"
}
}
Let me know if you want more info
@m-redding @JoshLove-msft
This seems to have dropped off the queue, can it be assigned to someone else?
Hi @RDavis3000, Since the root cause appears to be that the Event Grid service is not delivering events quickly enough, we've marked the issue as needing service attention. Another option at your disposal would be to file a support ticket from the Azure Portal.
Hi @RDavis3000, we deeply appreciate your input into this project. Regrettably, this issue has remained unresolved for over 2 years and inactive for 30 days, leading us to the decision to close it. We've implemented this policy to maintain the relevance of our issue queue and facilitate easier navigation for new contributors. If you still believe this topic requires attention, please feel free to create a new issue, referencing this one. Thank you for your understanding and ongoing support.
Library name and version
Microsoft.Azure.Webjobs.Extensions.EventGrid 3.1.0
Query/Question
I have a data ingestion project which uses Event Grid with Azure Function subscribers/handlers/publishers, basically a function does its part of the data processing and then publishes an event which is picked up by the next function(s) in the chain. When doing a large ingestion (couple of million records) starting at 2022-08-15T06:11:33.73Z, the performance seems good initially, but after an initial spike of activity the functions seem to receive fewer and fewer event grid trigger hits and it just crawls along. After 24 hours I got a sudden load of 800K dead lettered events because their TTL had expired. The function app CPU rarely went above 40% and it has auto scale for CPU above 70%, running on a Premium Plan. I'm not sure why the event grid didn't push the functions harder considering there were hundreds of thousands of events waiting to be delivered. I've attached the relevant metrics, happy to share any other information required.
Function App: /subscriptions/814f37ac-6bb0-4108-9c29-72909f9eef1e/resourceGroups/rg-u-lxr-aue-ingestion/providers/Microsoft.Web/sites/fa-u-lxr-aue-ingestion-doors
Event Grid Topic: /subscriptions/814f37ac-6bb0-4108-9c29-72909f9eef1e/resourceGroups/rg-u-lxr-aue-ingestion/providers/Microsoft.EventGrid/topics/egt-u-lxr-aue-ingestion-doors
My query is basically how can I get EventGrid to deliver more events, faster, to the function triggers? And are you able to take a look behind the scenes and tell me if something was going wrong which I don't have visibility of?
EventGridMetrics.xlsx
Environment
Project Assemblies: Azure.Messaging.EventGrid 4.8.2 Microsoft.Azure.Webjobs.Extensions.EventGrid 3.1.0 Microsoft.NET.Sdk.Functions 4.1.9 Microsoft.Azure.Functions.Extensions 1.1.0
Running Azure Functions runtime ~4
dotnet --info .NET SDK (reflecting any global.json): Version: 6.0.303 Commit: 7494fee64a
Runtime Environment: OS Name: Windows OS Version: 10.0.22000 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\6.0.303\
global.json file: Not found
Host: Version: 6.0.8 Architecture: x64 Commit: 55fb7ef977
.NET SDKs installed: 3.1.422 [C:\Program Files\dotnet\sdk] 5.0.409 [C:\Program Files\dotnet\sdk] 6.0.301 [C:\Program Files\dotnet\sdk] 6.0.303 [C:\Program Files\dotnet\sdk]
.NET runtimes installed: Microsoft.AspNetCore.App 3.1.26 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 3.1.28 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 5.0.17 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 6.0.6 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 6.0.8 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 2.1.30 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 3.1.26 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 3.1.28 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 5.0.17 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.6 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.8 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 3.1.26 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 3.1.28 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 5.0.17 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.6 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.8 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
Download .NET: https://aka.ms/dotnet-download
Learn about .NET Runtimes and SDKs: https://aka.ms/dotnet/runtimes-sdk-info