Azure / azure-webjobs-sdk

Azure WebJobs SDK
MIT License
739 stars 358 forks source link

Queued BlobTrigger executions not validated again before execution #1112

Open lindydonna opened 7 years ago

lindydonna commented 7 years ago

From @FinVamp1 on April 12, 2017 18:34

Summary 2 Web Jobs using the same Storage account. Web Job A has a Blob Trigger for abc/{name} against Container A. Web Job B has a Blob Trigger for abc/{name} on Container B.

Issue Sometimes the Web Job will pick up blobs from the other container. Executing: 'Functions.Run' - Reason: 'New blob detected: abcde/Farrow Archive/Room2007-10/804-01.org'   Function had errors. See Azure WebJobs SDK dashboard for details. Instance ID is 'c3ed6b97-e337-414c-a09f-23b04f8660b7' Executing: 'Functions.Run' - Reason: 'New blob detected: abcde/Farrow Archive/Room2007-07/757-01.org' Executing: 'Functions.Run' - Reason: 'New blob detected: foobar/Farrow Archive/Room 2004-03/082-01.Bak' Microsoft.Azure.WebJobs.Host.FunctionInvocationException: Exception while executing function: Functions.Run ---> System.InvalidOperationException: Exception binding parameter 'blobName' ---> System.InvalidOperationException: Binding data does not contain expected value 'blobName'.    at Microsoft.Azure.WebJobs.Host.Bindings.Data.ClassDataBinding`1.BindAsync(BindingContext context)

The WebJobs are running locally and not in an Azure Web Site.

There are two ways to reproduce this.

  1. Upload a folder (using Azure Storage Explorer) containing thousands of files (he has about 15,000 files in his case) to the blob container the webjob is listening to and wait for the webjob Function to get triggered or you can use the attached folder with this many files.  While the uploading is still ongoing and the function is processing each blob as they come in, kill the webjob  Change the code to a different container, compile, and run the job again.  If the problem does not happen, change it back to the right blob container name and run again to see the job continues to process the new files as they come in  Repeat step 2 – 4 again until the problem repros

  2. Try with 2 web jobs as described in the actions.

Copied from original issue: Azure/azure-webjobs-sdk-script#1400

lindydonna commented 7 years ago

From @brettsam on April 14, 2017 17:44

Behind-the-scenes, the blob trigger discovers new blobs and adds them to a queue. From there, the (potentially multiple) web jobs instances can monitor the queue and pull the individual blobs for processing. Once the blob metadata is in the queue, we never check again that it's going to the right function.

So:

  1. Thousands of blobs are inserted.
  2. Those thousands of blobs result in thousands of queue messages that contain the details about the blob as well as the function that should process them.
  3. Before those messages are completely processed, the host shuts down.
  4. The container name of the BlobTrigger is changed, but the function name does not.
  5. When the host starts back up, it starts processing those queue messages and routing the blob processing to the function with the matching function name. In this case, the path has changed, but we don't notice.

This can probably be improved, but one workaround is to change the function names when you change the container names.

lindydonna commented 7 years ago

@FinVamp1 FYI that WebJobs specific issues go in this repo.

Moving to backlog as it is quite a corner case.

mathewc commented 6 years ago

Brett is right - the issue is that here when we're processing queued blob executions, we're basing this on function ID only, not rechecking that the path details still match. This is a simple correctness change we can make. E.g. we could simply add the target container name/details as part of the BlobQueueRegistration. ContainerName/Path are already part of the BlobTriggerMessage so we can simply compare at that point.

Likely this also affects the new SharedQueue work @watashiSHUN did so.

watashiSHUN commented 6 years ago

hmm it might be more complicated for sharedQueue, in case of EventGrid, since we are not doing any polling, I won't notice when user changed endpoint (from EventGrid blade)