Open chriscartled opened 1 year ago
Thank you for your feedback. This has been routed to the support team for assistance.
@chriscartled Apologies for the late reply. Thanks for reaching out to us and reporting this issue. We are looking into this issue and we will provide an update.
Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @ahmedelnably @fabiocav.
Adding Service team to look into this.
@ahmedelnably @fabiocav Could you please provide some pointers on this? Thanks in advance.
+1 to this problem, also applies to Node.js based functions
Also in java Blob Trigger. The message shows every 10s.
My code is very simple.
package com.function;
import com.microsoft.azure.functions.annotation.*;
import com.microsoft.azure.functions.*;
public class BlobExample {
@FunctionName("BlobExample")
@StorageAccount("AzureWebJobsStorage")
public void run(
@BlobTrigger(name = "content", path = "container/{name}", dataType = "binary") byte[] content,
@BindingName("name") String name,
final ExecutionContext context
) {
context.getLogger().info("Java Blob trigger function processed a blob. Name: " + name + "\n Size: " + content.length + " Bytes");
//context.getLogger().info("Java Blob trigger function processed a blob. Content: " + content);
}
}
I encountered the same issue with my dotnet function. I eventually tracked it down to the ScanBlobScanLogHybridPollingStrategy.PollNewBlobsAsync
method.
This was causing millions of transactions per day
The blob paged api has a page continuation token value to determine if the current page is the last page. That token is an empty string for the last page, but the code was checking against null
. This resulted in the date time never being assigned and was always DateTime.Min. Thus every blob get checked again and again against that date
I noticed someone else fixed the issue here: https://github.com/Azure/azure-sdk-for-net/commit/a69839f4a33ab39a7d1441e36415f08e00b76ff0
I think Microsoft.Azure.WebJobs.Extensions.Storage.Blobs 5.1.2 introduced the fix
edit: After switching to the latest version (5.1.3) I still found this polling mechanism continuously checks the most recent batch of files due to these lines:
Above CurrentSweepCycleLatestModified
is set to the most recent last modified date. This becomes the LastSweepCycleLatestModified
afterwards.
If there are no new files coming in, then LastSweepCycleLatestModified
remains assigned to the last modified date of the most recent file - still unchanged.
As a result, those most recent files will be checked over and over again, continuously:
According to the comment earlier about rounding, it seems like you cant really mess with the logic.
In my case I had copied over a batch of 60 files at the same time (out of ~700 in the container), and they were continuously being checked every 10 seconds (PollingInterval
)
+1 NodeJs 20 Windows
Same behaviour. It's even stranger, since new blobs are also skipped,with the same message
const { app, output } = require('@azure/functions');
const csv = require('csvtojson');
const { v4: uuidv4 } = require('uuid');
const eventHubOutput = output.eventHub({
eventHubName: 'mut-admin-detalle-tabla-dividendo',
connection: 'EventHubConnection',
});
app.storageBlob('mut-admin-detalle-tabla-dividendo', {
path: 'landing/mut-admin-detalle-tabla-dividendo/{name}',
connection: 'StorageConnection',
return : eventHubOutput,
handler: async (blob, context) => {
context.log(`Storage blob function processed blob "${context.triggerMetadata.name}" with size ${blob.length} bytes`);
const stringBlob = blob.toString("utf8");
const batchId = uuidv4().toString();
const landedTimestamp = new Date();
const blobUri = context.triggerMetadata.uri;
const jsonBlob = await csv({'delimiter': '|'})
.fromString(stringBlob)
.subscribe((json => {
return new Promise((resolve, reject) => {
try {
json['batch_id'] = batchId;
json['message_id'] = uuidv4().toString();
json['landed_out_dateTime'] = landedTimestamp;
json['blob_uri'] = blobUri
resolve();
} catch(e) {
reject(e);
}
})
}));
context.log(`Sending ${jsonBlob.length} messages to Event Hubs`)
return jsonBlob;
}
});
Library name and version
Azure.Storage.Blobs 12.15.1
Describe the bug
I have a blob trigger set up to montitor one of my containers in Azure Storage. The associated scan over the blobs is firing non-stop, looping over the same blobs again and again, causing excessive operations on the storage account.
Expected behavior
Would expect blobs already processed to be skipped.
Actual behavior
A scan is firing immediately after the last one has finished. I'm not sure if that's expected behaviour, but what is not expected, is that the same blobs are being detected by the scan over and over again.
I have noticed that at the start of each scan, this log message is printed to the function logs:
[Verbose] Poll for blobs newer than '0001-01-01T00:00:00.000' in container '<container>' with ClientRequestId 'xxx' found 1839 blobs in 302 ms. ContinuationToken: False
Comparing with other blob triggers, I have discovered that the relevant scaninfo file seems to not have been created/updated. If I manually create the scaninfo file with a date/time, then the next scan will poll for blobs newer than that time, but the scaninfo file is then not updated again automatically.
Blob receipts are being created, and each blob is processed with a message similar to the following:
[Verbose] Blob '<blob name>' will be skipped for function 'FillBlobMetaData' because this blob with ETag '"0x8D8798F7B192688"' has already been processed. PollId: '566d148e-11d0-4a80-86a0-c7d4cfab3b58'. Source: 'ContainerScan'.
Reproduction Steps
Enable my function within the function app on the Azure Portal.
Environment
.NET 6.0 app Azure Functions, runtime version 4.16.5.20396
Microsoft.Azure.WebJobs.Extensions.Storage.Blobs v5.1.1