Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.92k stars 440 forks source link

How to investigate messages being stuck and then being moved to poison queue? #10357

Closed eddynaka closed 1 month ago

eddynaka commented 1 month ago

Is your question related to a specific version? If so, please specify:

What language does your question apply to? (e.g. C#, JavaScript, Java, All)

C#

Question

I have an azure function that is queue based, which means, whenever a message is added to it, the azure function should do something.

We observed that messages are getting in a stuck state, which means, if I look at the azure storage queue, the messages are not available, but I can see that they exist (in azure portal, it shows something like 0 of 1000).

Then, after minutes, the message is moved to the poison queue.

No exceptions are being generated and with that, I can't tell if this is a bug in my code or somewhere else. How do I investigate such issues?

Thanks!

kshyju commented 1 month ago

I can't tell if this is a bug in my code or somewhere else

Are you doing sufficient logging in your function method which can be used to determine what is causing the incorrect behavior? Did you check your application insights data? That should tell you whether the function was invoked (with the queue message) or not.

eddynaka commented 1 month ago

I don't have one specific logging line when the azure function starts/middle/finishes.

But, besides that, do we have any other way of investigating? If we had timeouts, AppInsights would log the 30min default timeout as exception, which I'm not seeing.

Either way, I will make a change on my side to look add more data, but wondering if you can see something based on invocationId.

eddynaka commented 1 month ago

I just verified and I have logged when the function is called and leaves my code. And looking at application insights, I don't see any of those for those stuck messages.

I just see messages like:

{
  "Category": "Microsoft.Azure.WebJobs.Host.Queues.QueueProcessor",
  "LogLevel": "Warning",
  "HostInstanceId": "6886f40d-e2f4-4b4d-87cc-44179e8398ed",
  "ProcessId": "4236",
  "prop__{OriginalFormat}": "Message has reached MaxDequeueCount of 5. Moving message to queue 'matchqueue-poison'."
}

OperationId/ParentId are empty.

kshyju commented 1 month ago

For deeper investigations specific to your app, please consider opening a support ticket.

AartBluestoke commented 1 month ago

I believe one way to get open ended, unlogged functions is accidental offload to an unmonitored secondary thread, eg: an unawaited async

The original function just returns, with the actual work (including any logging that should have executed if the task execution continued) not executed.

eddynaka commented 1 month ago

I believe one way to get open ended, unlogged functions is accidental offload to an unmonitored secondary thread, eg: an unawaited async

The original function just returns, with the actual work (including any logging that should have executed if the task execution continued) not executed.

I wondered about that, but looked at the code and couldn't find any case like this. Will look again. Thanks for the suggestion!

kshyju commented 1 month ago

I am closing this as there is no action item left here. Please open an ICM and the team will be happy to investigate.