Crash with out of memory with rapid blob changes

AartBluestoke commented 4 years ago

When there are many (>1 million/hour) blob writes, and a blob trigger anywhere on the same storage account (even on a different container) job hosts with limited memory can crash, due the log watcher materializing large arrays.

Repro steps

Provide the steps required to reproduce the problem

have some code that writes many blobs (following code is hacked together to simulate the failure conditions similer to what was observed in production: warning, could run up a large bill by doing all the blob writes) https://gist.github.com/AartBluestoke/48115e7a80ac1df2b8360af0d58948b9
in a different azure function, on a different function host have a blob trigger somewhere in the storage account.

Expected behavior

Code runs as normal where 1 function writing lots of blobs doesn't negatively impact other functions (other than directly requested work)

Actual behavior

function crashes with out of memory - an analysis of a crash dump memory snapshot shows almost all memory used by 800,000 blobs being held within 2 arrays within the "BlobLogListener.GetRecentWritesAsync"

https://github.com/Azure/azure-webjobs-sdk/blob/85d463faa28790d72f0cda8f00b95db1030ba7b0/src/Microsoft.Azure.WebJobs.Extensions.Storage/Blobs/Listeners/PollLogsStrategy.cs#L124 materializes the enumerable of all recent changes for that container into a single array, even if there is no BlobTrigger attached to that container

https://github.com/Azure/azure-webjobs-sdk/blob/c9d92b2c271e1f4bd8120fc2f7b6cea5a50289a7/src/Microsoft.Azure.WebJobs.Extensions.Storage/Blobs/Listeners/BlobLogListener.cs#L55 will also materialize a blob list of all blobs modified within the threshold (the comment indicates 2 hours).

A) materializing all responses from a batch interface into a single list is not good practice. B) combining the two behaviors above means that you read a (large) list into 1 array, then re-group and re-materialize the list into a second array.

Known workarounds

None. Do not use a blob trigger on any containers in a storage account that has a high blob write volume.

Related information

Further discussion with azure support staff on ticket 120081723001883 Full memory dump available on request.

pragnagopa commented 4 years ago

cc @mathewc as FYI

Moving issue to Triaged milestone.

AartBluestoke commented 3 years ago

Hi, I have just seen this again, i have a function exploding with 2GB of ram usage (then a crash) every 5 minutes at the moment. This triaged, above. Is there a timeline for any next steps? Thanks, Andrew.

Azure / azure-webjobs-sdk