Azure / Azure-Functions

1.11k stars 196 forks source link

Telemetry: Operation_id used across multiple operations and instances #1171

Open thomas-schreiter opened 5 years ago

thomas-schreiter commented 5 years ago

The same operation_id is used for multiple function invocations and on multiple instances in my logs. I am not sure whether this is a bug or a feature, but I would expect the operation id to be unique, i.e. it would be used for at a most one function invocation. Could someone explain what the operation_id and operation_parentId are and how they relate to each other.

Functions v2 with C#, Durable Functions and Service Bus Triggers.

Example from my logs: Operation Id "n7LLU+NoOp8=" ran on multiple instances, e.g. 5a8fad246c55816174b4eba84beb44c2bba324ebc473273ee4ce50505bdb95ef and 12ea8c4cfaa1172d9e4f5a970a2d02cfddd8b6d2eccf39cd2cda6162f2b013fb. See also screenshot from the logs below. Let me know if you want to have the whole table and I can email it to you.

image

fhtino commented 5 years ago

Same problem here. Function v2, C#, timer function, runs every minute.

Since about 12 hours ago, every run is traced with the same operation_id. This result in zero "requests" in the last hours. The graph is empty.

Capture1

If I look at the trace items with

traces | where operation_Id == "84bc946d0176734fa...." | order by timestamp desc

I can see 2000 items and counting, and I see function is running

Capture2

It seems that Azure-Function engine is stuck on the same operation_Id instead of creating a new request and related traces with a different operation_id for every request.

One more detail, I don't know if related. The mother-request is strange. It has the url field filled "http://--myapp--.azurewebsites.net/admin/host/ping" instead of an empty value.

Capture3

peterwain commented 5 years ago

I'm also experiencing this issue under Functions v2, C# with a timer trigger. The trigger fires every five seconds, and we end up with a lot of telemetry being associated with a single operation_Id.

The following query and graph shows the pattern over time.

traces
| where timestamp > ago(24h) and operation_Name == 'SyncBatcher'
| summarize count() by operation_Id, bin(timestamp, 1h)
| render timechart

image

Running the same query and summarizing by cloud_RoleInstance yields exactly the same pattern. The operation_Id seems to only change when a new underlying VM starts handling the timer trigger executions.

elglogins commented 1 year ago

Any updates here? I'm still seeing the same on v4.

kubaodias commented 1 year ago

Any progress on that topic? This basically makes Transaction Search for Function App invocations not really helpful.

cmcconomyfwig commented 8 months ago

This is a problem we are experiencing as well

rashika1809 commented 5 months ago

@fhtino, Any updated to your query here. Were you able to solve it. I am facing the same scenario and looking for its resolution

andrewpaynter commented 3 months ago

I am also, still, experiencing the same issue

mrshanahan commented 1 month ago

Never seen this before but just encountered it while attempting to match up Service Bus-triggered function calls with trace logs. Several requests had the same operation_Id for a period of about 30 minutes:

same_operationid_03 same_operationid_02 same_operationid_01

We do have our Service Bus queue setup to attempt to redeliver on failed messages, but several of these are unrelated requests & it appears that redeliveries typically get a new operation_Id.

Furthermore, it appears that this issue occurs quite a bit - here's a collection of operation_Ids we've seen over the past 7 days:

same_operationid_counts_7days_startend

(EDIT: Added start & end timestamps for each operation_Id to show that these are occurring in "runs".) Are expectations around operation_Id documented anywhere? Is this a bug or is this just some weird timing issue?

We're running .NET isolated functions on version 4, FWIW. Target framework is 4.7.2.