Azure / azure-webjobs-sdk

Azure WebJobs SDK
MIT License
739 stars 358 forks source link

ICollector fails - IAsyncCollector has no failure path #2619

Open AdrianPell opened 4 years ago

AdrianPell commented 4 years ago

Description

Messages posted to a queue using ICollector occasionally fail. While this can be caught, IAsyncCollector cannot be used because the error is silently buried with no way to catch it.

Expected behavior

ICollector and IAsyncCollector should either be reliable, or should have clearly documented error paths

Actual behavior

ICollector fails with a StorageException. A similar IAsyncCollector would also fail, but the exception cannot be caught.

Relevant source code snippets

        [FunctionName("SourceFetch_Repost")]
        public async Task<bool> Repost(
            [ActivityTrigger] string message,
            [Queue("foragersourcefetchqueue-legacy")] ICollector<string> legacyQueue)
        {
            logService.Trace($"Repost legacy queue message: {message}");
            var count = 0;
            do
            {
                count++;
                try
                {
                    legacyQueue.Add(message);
                    return true;
                }
                catch (Exception ex)
                {
                    logService.Error($"Report {message} failed - attempt {count}", ex);
                }
                if (count < 5)
                    await Task.Delay(TimeSpan.FromSeconds(10));
            } while (count < 5);
            return false;
        }

Known workarounds

Catching the error thrown by ICollector and retrying works, but is not expected to be neessary.

App Details

If deployed to Azure

The operation ID listed is the call from the orchestration to the activity function shown above. For some reason (sampling?), App Insights does not include the called activity function. There appears to be a gap in telemetry.

cgillum commented 4 years ago

Hi @AdrianPell. This is an issue with the storage queue binding, which is maintained elsewhere, so I'm transferring this issue to the correct repo so that the right person can take a look.

v-anvari commented 3 years ago

Hi @AdrianPell , Apologies for the delayed response, Are you still facing this issue, if yes, can you provide us with the latest error details to understand the cause. Can you provide us with the repro steps to understand what you are trying to achieve.

ghost commented 3 years ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

AdrianPell commented 3 years ago

I'll need to put together a repro for you. Since it only happens if there's an exception, it's a bit of a problem.

Actually, the main issue here is that using an IAsyncCollector doesn't provide any way to catch any exceptions - which can be an problem if they occur. In this case, it makes IAsyncCollector essentially useless.

cgillum commented 3 years ago

Unfortunately the IAsyncCollector implementations are binding specific, so the actual behavior depends on which binding you're using. Some implementations will throw while your code is executing and some will not, like you're observing. Adding @mathewc in case he wants to comment further.

A workaround you can consider is to configure retry policies for your impacted functions, which is a relatively new feature that I think should be helpful for these kinds of situations.

v-anvari commented 3 years ago

Hi @AdrianPell , Let us know if you have any further queries regarding this issue

ghost commented 3 years ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

AdrianPell commented 3 years ago

Anusri

I have long ago moved away from IAsyncCollector to ICollector in order to give me enough awareness of failures, so I don't have repro cases any longer (they were sporadic anyway).

Chris Gillum: thanks for the retry policies, but in this case I don't think it's going to help as failure tolerance is fairly low. In addition, the overhead of the sync vs async operation is minimal for us.

I'd still question the utility of a "may fail" collector ...

Thanks for your help.

Best regards

Adrian

On Tue, May 11, 2021 at 3:00 PM msftbot[bot] @.***> wrote:

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 4 days. It will be closed if no further activity occurs within 3 days of this comment.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Azure/azure-webjobs-sdk/issues/2619#issuecomment-838525933, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6SIL4WPFSRUFN5PVZLN7LTNE2AJANCNFSM4TCUTS6Q .

v-anvari commented 3 years ago

Tagging @mathewc / @cgillum , for any further comments

v-anvari commented 3 years ago

cc @cgillum / @mathewc , Can you please look into this request