Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.92k stars 442 forks source link

Function app consumes a lot of resources event when functions are not triggered #2679

Open IngweLand opened 6 years ago

IngweLand commented 6 years ago

I have a function app with few functions. My function app is in consume plan (not App services plan). I develop and publish them from Visual Studio 2017 v 15.6.6. Today I have found out that the function app starts to constantly consumes around 60-70 Mb of memory right after I publish it from Visual studio. I also see constant in and out data (out is probably to storage account for logging). At the same time requests count stays at zero level (and I am not triggering this function by any way). This already resulted in increased bills for storage account because there were thousands of requests coming in (probably because of logging). Restarting the function app does not help. However, stopping it in the portal and then starting helps (still not clear if it helps in 100% of cases). Also, Application insights shows usage of 15+ servers. Again, I have to mentions, that all this happens when functions are not triggered at all.

IngweLand commented 6 years ago

Note for those who will check the app. I have to stop it because it consumes resources and results in increased bills.

paulbatum commented 6 years ago

@tohling this looks like an overscaling problem, can you take a look?

tohling commented 6 years ago

@IngweLand, I checked our logs from 2018-04-15, and it looks like for the function IoTCriticalFunction, there was a message that was stuck in the ServiceBus Queue for a long time. Our current scaling logic will add worker instances if we see that the length of the queue or the age of the first message in the queue keeps increasing. This caused the Function instance to scale out to many instances. Do you have insights on why that message was not processed by its corresponding Function during that timeframe?

IngweLand commented 6 years ago

@tohling To tell the truth, I do not know. I can only assume that the message arrived to the Service bus when Functions app was down. At the same time, Default message time to live on that queue is set to 1 min. And it was this way for a long time.

tohling commented 6 years ago

@IngweLand, do you know if there could be Deferred Messages in that queue?

IngweLand commented 6 years ago

I have investigated the issue more and I think I know the answers on some questions. This involves not only Function app but also other connected resources (but all still connected to Function app). If you think I should post the issue in another place, please, let me know. There are a lot of info there, so maybe you can first check the bottom part 4) - it's a new test function which also has the issue.

So, I am processing messages from IoT hub with the function. In this particular case the function name is IoTTelemetryFunction. When function starts for the very first time, it gets all old messages from IoT hub. I guess it's 'by design'. However, it's still not clear why it's working like that. In my case, my IoT hub still has messages which are 2 months old, and the total amount of previous messages are more than 40000. It's still absolutely not clear how resources are consumed when getting those old messages. According to charts (I mean function's analytics) incoming message count when getting those old messages stays at zero level. At the same time, Memory, Data In/Out raises. Also, what is worse, the usage of storage account which is connected to the Function app goes incredibly high. Note, that I am not storing any data. My function gets the message from IoT hub and puts it into Service Bus topic, that's it. What is even more strange, why the function started to receive all those old messages (with one of my previous deployment). The thing is that I was using that function before already, with 100% same IoT hub consuming group. This means, that I should not get old messages when they has been already delivered. Also, it's not clear, why it took few days of processing (the function was heavily running and consuming resources until I stopped it manually).

On a side note. Probably, it was not related to Visual Studio. Though I am not 100% sure.

I did not find a way to attach images here. So, I am uploading them to my server and providing links. 1) http://ingweland.com/temp/charts.PNG Two bottom charts. Left is for Storage, right for Functions. Storage shows File transactions. You can see millions of events for few days. 2) http://ingweland.com/temp/IoTTelemetryFunction.cs Source code of the function 3) http://ingweland.com/temp/charts_18.04.18.PNG This shows another issue with Functions. This is current state. Function's executions and resources stays at zero or near it (because there was zero incoming messages coming into it), however, connected Storage account constantly consumes resources. Again, this Storage account is connected only to Functions and is not used anywhere else (also, it's not used from my code, so all its resources are used internally by Azure). The dark blue line is for Blob transactions. 4) http://ingweland.com/temp/test_storage.PNG This image requires more explanation. I created a new test Function app, 1.0 runtime. https://ingwefunctest.azurewebsites.net It has single function in it created in the Portal. The function is a bare minimum created by the portal when creating IoT hub triggered function. Everything it does is log incoming message. I connected it to same IoT hub, but different consuming group. As expected, it processed all those old messages first. However, in this case, the processing took minutes, not days. But the idea of the test was to measure resource consumption when the function is not executed. The chart shows function memory and execution count as well as Blob transactions of connected storage account. Function's resources stays at zero, but storage generated 8.5 thousand transactions! My Function app is on Consume plan. Azure says I will pay only when the function will be executed. I must connect those storage to function, this is a mandatory step. So, if my function is not executed, why am I still paying?

IngweLand commented 6 years ago

@tohling Yeah, I have just checked in Service bus explorer and see 1 Dead message there

IngweLand commented 6 years ago

@tohling Do I understand correctly, if there is such a Dead message, it will cause this issue again? Because the function is running right now for few days, without such a huge issue as before. But i actually did not invoke Critical function. I have another one in that Function app, to process normal messages. I thought the issue was with it.

IngweLand commented 6 years ago

@tohling Also, you said that the logs showed that message on April 15. However, the function was consuming a lot of resource in between April 11-15

philbritton commented 6 years ago

I’m also seeing a similar issue, stuck service bus message, scales out function to max but nothing seems to be doing any work (no logs generated) I am putting messages on the sb queue with a function that receives a post and adds to queue. During time of stuck message generated a lot of sb activity, assuming all instances polling queue.

tohling commented 6 years ago

@IngweLand , yes if you have a deferred message stuck in the front of the queue, it would cause of over-scaling. Let me provide some background to explain why this is happening:

The ServiceBus (SB) implementation puts all message types (active and deferred) in the same queue. Every 10 seconds, the scaling logic executes a Peek() operation on the SB queue to get the length and the age of the first message in the queue. If either the length or the age is increasing at a consistent rate, the scaling logic will scale-out and add a new instance.

Actionable messages that can be consumed by your Function code are active messages but not deferred messages. To consume a deferred message, the sequence ID of the message that was deferred should be cached/stored at the time of creation and later used to retrieve it from the SB queue. This stateful scenario is currently not supported in Azure Functions. In fact, there is no intuitive way to Peek() past this deferred message, because SB does not provide any API to Peek only the first actionable message. In summary, deferred messages will cause over-scaling.

If the deferred message that is stuck in your SB queue was unintentional, you will need to remove it out-of-band. If your workflow anticipates some deferred messages occasionally, then you could move them into a separate queue in your Function code to be processed separately by another Function. This is to ensure that your SB queue will always only have "immediately actionable" messages to workaround the over-scaling issue.

Regarding your question about the logs, I searched the logs for April 15 because that is when you opened this GitHub issue. You are correct in that, the issue may have started occurring on April 11th.

Lastly, regarding the cost of Consumption Plan, the Azure Functions team only charges the customer when the Function actually executed. In the case of scaling to 15 instances as per your Application Insights logs, you are not charged for those instances if you Function code did not execute on any of those 15 instances. If it ran only 1 time on 1 out of the 15 instances, you will be charged only for that 1 execution. Additionally, you will also be charged by other teams for the resource consumption of their products/services, such as Azure Storage and Azure ServiceBus.

IngweLand commented 6 years ago

@tohling What exactly is deferred message? Could you please give me the link to the docs. I thought this is what is called DeadLetter message, but probably is not.

tohling commented 6 years ago

@IngweLand here's the official doc on Deferred messages https://docs.microsoft.com/en-us/azure/service-bus-messaging/message-deferral

IngweLand commented 6 years ago

@tohling Also, if that issue was caused by that deferred message, I am not sure how it was "fixed". It's gone now, and everything i did after April 15 - just stopped the Function app for half of the day or so.

tohling commented 6 years ago

@IngweLand, all SB messages have a default expiration time (that is also configurable). My guess is that the stuck deferred message expired and if you have the dead-lettering feature enabled for your queue, the message got moved to your dead-letter queue.

IngweLand commented 6 years ago

@tohling I need to read more about those deferred messages and how to deal with them. However, the general question about unclear resource consumption stays open. Please, check my today's big message, part (4). I specifically created minimal Function which is connected to brand new Storage account. This Storage account is not used by any of my code directly. Thus, I can assume that all it's activity goes from underlying implementation of Function app. I cannot understand, why my function, when consuming zero resources (zero executions, zero memory consumption) still result in hundreds of storage operations per hour.