Azure / azure-functions-python-worker

Python worker for Azure Functions.
http://aka.ms/azurefunctions
MIT License
331 stars 100 forks source link

[BUG] Very High Memory with Blob Input Binding #1449

Closed ajstewart closed 3 months ago

ajstewart commented 3 months ago

Investigative information

EDIT: See comment after this one but I also observe high memory locally, not just in K8s.

This is a bit of an odd one but I believe something is up with the memory when using the input blob binding and deploying the function to kubernetes.

My scenario is that I was creating quite a basic function that would:

This was:

The function seemed to run ok locally using func tools (Mac M1) but when deployed to K8s either in Minikube or an AKS cluster the memory requirements were very large. In fact the pod memory kept growing and growing until it hit the resource limit and would be restarted (which I had set to 1 Gi). Nothing about the function should require this memory footprint at all.

I have tested in both async and sync function definitions.

I also tested Python 3.10.

After investigating I narrowed it down to the blob input binding. If I turned off the binding and instead used azure-storage-blob sdk to load the image in the function the memory response went back to normal.

I have used the blob input in other functions but this is the first time using K8s.

So I'm not sure if I'm doing something wrong here, but something doesn't seem quite right.

Repro steps

As this is quite a convoluted setup I have written a repo that can reproduce the problem with instructions:

https://github.com/ajstewart/azure-function-input-binding-memory-demo

(Note that while I say on the repo I used 10 messages, below there were 18 messages in the system through various stops and starts when testing)

Below are the results when using this example function. In my true function where there are other dependencies and a few other steps, the memory increases very rapidly after running.

Using the Input Binding

The Pod memory continues to grow.

image

Sometimes I see it free up a bit but it never gets down to the level of using the SDK below. Usually it ends up reaching a level that forces a pod restart.

High memory on app insights:

image

Using the SDK

The Pod memory hovers around 300MB as you would expect.

Whoops forgot screenshot

Application Insights reports between 500 - 600 MB committed memory, doesn't move much.

image

Expected behavior

I expect to be able to run the azure function using the input binding without excessive memory usage.

Actual behavior

Memory grows and grows using the blob input binding until the pod is restarted.

Known workarounds

Use the SDK instead of the binding.

Contents of the requirements.txt file:

Provide the requirements.txt file to help us find out module related issues.

Related information

Provide any related information
ajstewart commented 3 months ago

An update on this as I realised I had not really simply run the function locally and monitored the memory usage.

I see a very similar memory problem just running the function with func start

These are screenshots from Activity monitor on my Mac:

With input binding

image

With SDK

image

The input binding doesn't seem to grow and grow from what I tested but jumps around a lot and seems to be around 5x the memory footprint of the SDK.

I've changed the title to reflect that there's a significant memory different outside of deployment in K8s.

hallvictoria commented 3 months ago

Hi @ajstewart, this is expected behavior. With the blob input binding, the entire file is passed from the host to the worker through grpc messaging and is loaded into memory. The Azure Blob Storage SDK communicates directly with the storage account instead of directly loading in the blob, so it doesn't run into the same memory usage.

ajstewart commented 3 months ago

Hi @ajstewart, this is expected behavior. With the blob input binding, the entire file is passed from the host to the worker through grpc messaging and is loaded into memory. The Azure Blob Storage SDK communicates directly with the storage account instead of directly loading in the blob, so it doesn't run into the same memory usage.

Hi @hallvictoria thanks for the answer!

Though I'm not sure I understand it. I see what effectively looks like a leak, if I use the binding it eventually grows too much and the process is usually killed with an OOM, especially when running in a containerised situation.

In the experiment I am using the SDK to also fully read the file into memory and assign it to a variable. So I don't understand how there can be such a huge gulf in memory usage? With the SDK I at least see the memory freed up every now and then after the function has finished.

As it stands I've had to abandon using the blob input binding as it grows and grows in memory usage.

hallvictoria commented 3 months ago

Yes, unfortunately that's a known limitation for blob bindings as of right now. The main difference between using a blob input binding and the SDK is how the file is being loaded in.

With the blob input binding, the file is loaded in by the host and passed to the worker through a gRPC messaging channel. gRPC messaging has limitations, so the memory can grow and overload the channel, especially with high loads / large files, resulting in OOMs.

When using the SDK, you won't run into the same limitations. Even though in your function app you're using the SDK to read in the full file, the file isn't being sent through that messaging channel (which is where the memory growth issue is occurring).

Let me know if that makes sense or if you have any other questions!

ajstewart commented 2 months ago

Thanks @hallvictoria for your response, it's much appreciated.

This reads that it's a known issue rather than a limitation, is that fair to say? Do you know if this is planned to be addressed?

The blob input binding is really useful and it's a shame I've had to turn it off, and it's only a 5MB png file that is only triggered say 50 times an hour. Seems like it should be able to handle this without overloading the memory.

Is it relatively new? I've used the input binding before and not really noticed it, though admittedly this is also the first time I've run the function outside of an Azure function app deployment.

hallvictoria commented 2 months ago

I don't think this is new, but yes it's known. We'll be introducing a feature that addresses this soon! Stay tuned :)