Azure / azure-webjobs-sdk

Azure WebJobs SDK
MIT License
737 stars 358 forks source link

Add ability to create blob triggers on a container names that match a pattern #779

Open kaleemxii opened 8 years ago

kaleemxii commented 8 years ago

Currently there is no easy way to create blob triggers on list of containers that match a (name)pattern, or for all the containers in a storage account for that matter. Having that would be really useful , an I don't see much change required on the implementation as anyway it(the trigger logic) is based on the storage logs analysis, instead of looking for a specific container name just look for any container or pattern match on container name and trigger accordingly.

I would be happy to contribute on this if required let me know.

christopheranderson commented 8 years ago

Feel free to take a shot on this. I've heard similar requests. We have concerns on how the perf would work for this, so we'll likely be picky on any PRs for this that it doesn't adversely impact performance.

rahulrai-in commented 7 years ago

I can take this up if @kaleemxii is not working on this already.

kaleemxii commented 7 years ago

@moonytheloony feel free to pick it up, I've not yet started on this , but implemented a $logs blob tracking Azure function to accommodate this particular scenario.

rahulrai-in commented 7 years ago

@christopheranderson I propose to add an attribute on web job function that would run a RegEx expression before the function is triggered. e.g. something of the following format:

TriggerExpression["{name}", "/w*"] public static void WriteLog([BlobTrigger("input/{name}")] string logMessage, string name, TextWriter logger) { logger.WriteLine("Blob name: {0}", name); logger.WriteLine("Content:"); logger.WriteLine(logMessage); }

Does this approach look good?

petersondrew commented 7 years ago

@moonytheloony So that example would support an additional regular expression on the blob name correct? I'm interested in the use-case proposed by @kaleemxii where multiple listeners are created, one for each container that matches the wildcard. It seems the biggest hurdles are performance, and handling containers that are added after the job starts in the case of a continuous job. Do you have any thoughts on that?

rahulrai-in commented 7 years ago

@petersondrew: Thank you for adding your thoughts. I am very happy to contribute to the SDK development. Handling the case where the function should be triggered when a blob is added should be pretty straightforward to implement and should not incur performance overhead. I propose that we use the existing mechanism which checks for new blobs in the specified container and just before it executes the underlying function, run the blob name through the RegEx.

However, you are correct in saying that there would be a performance overhead associated with performing this operation at container level (I think that's why the SDK doesn't support triggers at container level).

I can work with you to design a strategy for that as well. This should involve retrieving create container operations (along with blob operations that we retrieve today). Should I work on the basic functionality and work our way up from there?

petersondrew commented 7 years ago

That would be great, I'm interested in implementing a solution for handling containers with not only a regular expression, but also a pattern, e.g. [BlobTrigger("container-{containerId}/{blobName}")]. This would allow one function definition to be triggered for any blob added to any container that matches the pattern, and get access to the containerId and blobName in my example as arguments.

rahulrai-in commented 7 years ago

@petersondrew Again the approach that I previously demonstrated will help you specify a RegEx for containerId in your case. For example, trigger the function when containerId starts with a particular alphabet e.g. container-A1004 will match but container-1004 won't match. A regex in your case will help you map A1004 to containerId that you can use in your function. Would you like to get started with the implementation?

rahulrai-in commented 7 years ago

@christopheranderson Would you like to add some comments on behalf of the microsoft team?

MisinformedDNA commented 7 years ago

I would like to use this to separate various research groups into their own container. I'm using Azure Functions and one solution I see would be something like changing from

  "bindings": [
    {
      "name": "stream",
      "type": "blobTrigger",
      "direction": "in",
      "path": "container1/{filename}",
      "connection": "AzureWebJobsStorage"
    }
  ]

to


  "bindings": [
    {
      "name": "stream",
      "type": "blobTrigger",
      "direction": "in",
      "path": ["container1/{filename}", "container2/{filename}"],
      "connection": "AzureWebJobsStorage"
    }
  ]
jordanwallwork commented 7 years ago

Has there been any movement on this? We've been using separate containers per customer account to store customer images - we've now been asked to perform some image processing on each of them as they're uploaded. I want to use Azure Functions with a blob trigger, but I'd need to be able to trigger using a pattern on the container name

CrazyTuna commented 6 years ago

I've tried EventGrid and you can solve this problem by reacting to blob storage events. You can also filter events bases on container prefix. You can route all your events to a queue and then add a blob input binding.

asipras commented 6 years ago

Unfortunately, Event Grid is currently in preview, and available only for storage accounts in the westcentralus and westus2 regions.

mathewc commented 6 years ago

Our current blob discovery algorithm performs two separate discovery scans in the background. First we're scanning back through $logs, and we also have a second container scan algorithm that is scanning the target container for unprocessed blobs. Neither of these is great, but it has been workable. Our goal is to replace these scans with an EventGrid implementation - that's our roadmap.

Now it would be possible to augment our $log scan to also support regular expressions, since that scan is based on matching each container/blob found in the logs with the set of registered blob paths to trigger on. However, the second container scan algorithm isn't amenable to this approach, since it relies on up front knowledge of concrete containers to scan.

If something cheap/easy can be done for the $log scan, we might be open to it if there is enough demand. However, the $log scan has up to a 10 minute delay due to Azure Storage logging intervals (that's why we have the hybrid approach). But given that we're moving towards the Event Grid model, we're very hesitant to do any more new feature work here.

CrazyTuna commented 6 years ago

I am agree that the event grid approach will be the future

a-patel commented 5 years ago

@all Any update on this?

brandonh-msft commented 5 years ago

Event Grid & Storage events are no longer in Preview and are the right solution here.

a-patel commented 5 years ago

I want to trigger Azure Function when file uploaded in Azure Blob Storage in ANY container.

I think, currently azure trigger azure function for specific container only.

Is this the right place to ask this question?

brandonh-msft commented 5 years ago

Storage Events allows this.

MisinformedDNA commented 5 years ago

@a-patel Event Grid is the way forward. It has blob trigger events.

a-patel commented 5 years ago

@brandonh-msft / @MisinformedDNA Does Azure Function Trigger when file uploaded in Azure File Storage (not Blob)?

brandonh-msft commented 5 years ago

Not yet. See list of supported storage event sources here. If this is something you want, just let us know.

ghost commented 5 years ago

Event Grid has not "blob update" events, when blobTrigger has. So i have 2 functions, one with eventTrigger and another with BlobTrigger just for handling blob metadatas updates. But because i have many containers, BlobTrigger with unique container name is constraining

brandonh-msft commented 5 years ago

Event Grid has not "blob update" events, when blobTrigger has. So i have 2 functions, one with eventTrigger and another with BlobTrigger just for handling blob metadatas updates. But because i have many containers, BlobTrigger with unique container name is constraining

@RockyBalboa2018 I'd encourage you to file new feedback on the product via UserVoice

ghost commented 5 years ago

Done. But Product Manager seems very busy, no answers since May 2018 :-(

brandonh-msft commented 5 years ago

@RockyBalboa2018 what you're requesting is a feature for Event Grid; this is the Azure Webjobs/Functions Host repo. So, ultimately if you've filed on UserVoice you've contacted the right people; if/when EG starts posting BlobUpdated events, you'll be able to get them w/ Functions seamlessly via the HttpTrigger (or EventGrid trigger if you choose to use that).

brandonh-msft commented 5 years ago

@brettsam I think we can close this based on the conversation that's taken place; you can trigger on blobs that match container names using event grid which is the advised way to trigger blob actions now (vs in 2016 when this was filed ;) )

MisinformedDNA commented 5 years ago

@brandonh-msft What's unfortunate is that the Functions team used to drive this, but now we have to rely on Event Grids who have made no progress in creating anything new for Azure Storage since Event Grid's GA. So you offloaded your work to a partner team and now it is dying.

banisadr commented 5 years ago

@MisinformedDNA I'm from the Grid team - just made aware of this thread. We have in fact added quite a bit of capability since Event Grid's GA including Advanced filters which directly addresses the issue raised in this thread at the beginning.

We also worked with the Azure Blob Storage team to enable four new Event Types for ADLS Gen 2 eventing.

On the topic of Blob Metadata Updated events, we have discussed this with the Blob Storage team in the past, but had not seen significant demand for it. We're currently re-visiting the issue.

On the topic of Azure File Storage events, we are actively working with File Storage on a plan for implementation, though I can't yet share an ETA.

What you need to understand about the eventing story in Azure going forward is that Events are produced by each service directly and Event Grid acts as the conduit for routing, filtering, and delivering those events. While we are actively working with almost all of the major teams in Azure to integrate with Event Grid, the actual step of producing and publishing those events is on each service within Azure. If you want Events from CosmosDB, your feedback is best served directly addressed to that team, same with File or Blob storage. We do our best to track and surface the feedback to each team, but nothing is more effective than your direct feedback.

brandonh-msft commented 5 years ago

Thanks @banisadr! For anybody wondering, here's the UserVoice for Azure Storage and as of today it doesn't look like anybody is asking for Blob update events to be published to grid 🤷🏻‍♂️, but you can be the first! Incidentally, there is one for Azure File event triggers for those interested to go upvote now :)

MisinformedDNA commented 5 years ago

@banisadr Thanks for the comment and updates. It's good to know that we need to hound the services and not your team.

There was an existing issue in Event Grid, but I filed a new one in the Storage forum: https://feedback.azure.com/forums/217298-storage/suggestions/38240866-support-blob-updates-on-event-grid