Open mathewc opened 7 years ago
I agree we should expose something to control this. Since we already maintain the blob scan pointer for tracking our last processed blob, we'd need to make sure that the behavior makes sense when these two interact.
For sake of argument -- let's call this new property newerThan
.
For example:
newerThan
= Jan 1, 2017. Last blob scan was Dec 1, 2016 -- We'd skip over all of December when we start processing.newerThan
= Jan 1, 2017. Last blob scan was Feb 1, 2017 -- We wouldn't want to re-process all of January, would we?In other words -- we'd start our scan from whichever was newest between newerThan
and the stored blob scan pointer.
As a side note -- I think writing out informational logs (like we do for Timer) would be very helpful here. Something like Found blob scan pointer of {date} and NewerThan value of {date}. Starting scan at {date} because it is the most recent. To change this, ....
It'd only write out once at Listener start and could go a long way towards explaining the logic without needing to look up docs.
This would be very helpful in a few scenarios i came across. My current case - scanning over SQL audit blobs generated by Azure's SQL Blob Auditing feature. We have a pretty high retention rate for those but only need to process the logs going forward, Which sounds perfect for an Azure Function with a Blob Trigger - Until you realize you have to let it run in a NOOP style over all of them, for each host, before its usable.
This would really help similar scenarios.
One possibility that could help here is using Event Grid's support for routing storage events to azure functions. This approach does not involve any blob scanning which is the cause of the main issue here.
https://docs.microsoft.com/en-us/azure/event-grid/resize-images-on-storage-blob-upload-event
Resurfacing this as this is something I would love to be able to do. Any idea on if/when this might be looked at?
Thx!
No idea at this time (that's what the "unknown" milestone means).
Another year has passed - any update?
Any update? This is kind of annoying as I have to sit there waiting for 10 minutes for the trigger to reprocess each blob. I'm not sure why but the receipts get reset sometimes which means it will reprocess everything
Hello, I'm the same. it fires 3 times. surely they are the events I have created for testing. But how do I eliminate them all so I create a new one to run only that one?
In the cosole there are 3 events that are triggered at the same time: 2020-09-18T15:28:13.541 [Information] Executing 'D2KEventGrid' (Reason='EventGrid trigger fired at 2020-09-18T15:28:13.2184099+00:00', Id=595fc416-0280-43e1-8dc5-f285640e986c) 2020-09-18T15:28:13.569 [Information] Executing 'D2KEventGrid' (Reason='EventGrid trigger fired at 2020-09-18T15:28:13.0399176+00:00', Id=f1796f9a-f06c-47e8-8e67-be34950629a3) 2020-09-18T15:28:13.570 [Information] Executing 'D2KEventGrid' (Reason='EventGrid trigger fired at 2020-09-18T15:28:12.7600042+00:00', Id=c34bb611-2c99-4c1d-ba4b-5675cc87236c)
@pablosguajardo I think you're talking about something different to what is being discussed here, because it looks you are using eventgrid, while this issue is discussing the behavior of the built-in blob trigger..
What about this additional parameter 'Start time'? Does this have anything to do with this? I can't find documentation about this parameter.
Any update on this? This feature would be very useful.
Is this related to the same blob being triggered for multiple hosts? For example, a blob already processed by a production Function, is also triggered when a dev machine runs the function/project locally. We've seen files from YEARS start to trigger for processing.
Any Updates on this functionality?
Any update on the above discussion?
Really need this feature!! Please help add it : )
More than 6 years after, can we at least have news about it ?
I am also hoping for this feature as well, cause when you first publish the trigger in the function app and you have an existing blob with files. You upload one for testing, instead of taking just the testing one. It takes the testing plus any other files in it. But after that, its fine. Next file you upload, it will only trigger for that folder.
This would be really helpful!
I too vote for some way to prevent old files from triggering new functions. Just deployed a change to how I process files (new function, rather than "remodeling" of old function), and have over 1500 files worth of useless processing going on.
At least we were smart, and have a check that the file has already been added to our system!
Currently our blob scan algorithm will process ALL blobs in the target container that don't have blob receipts. We should investigate whether we can allow the start date for the scan to be specified.
Scenario: assume all blobs in a container have been processed by a blob trigger function in a particular app (WebJob host). Now, if that function is moved to a different app (different host/host ID) all the blobs will be reprocessed, because there are no receipts for those blobs for that host ID.