Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.93k stars 441 forks source link

"Azure functions runtime is unreachable" error exactly one year after app deployment. #9113

Open mithunshanbhag opened 1 year ago

mithunshanbhag commented 1 year ago

The sequence of events

  1. Exactly a year ago (on 2/23/2022), I'd deployed my APIs to my Azure function app.

    image

  2. Yesterday (i.e. a year later, on 2/23/2023), I noticed that the APIs started returning 503 / unavailable errors. Upon logging into the Azure portal I noticed the Azure functions runtime is unreachable error. Restarting the app didn't help.

    image

  3. The Diagnose and solve problems tab on Azure portal led to the following discovery: 11 instances of Microsoft.AspNetCore.Connections.ConnectionAbortedException. Not entirely sure how much this is related to the app's downtime.

    Screenshot 2023-02-23 234711

    Screenshot 2023-02-23 234800

  4. I opened a support ticket 2302230030002367 for this issue. While the root-cause investigation was inconclusive, I was able to resolve the issue by simply redeploying the app once again.

Probable root cause

  1. I later stumbled upon the possible root cause (thanks to Erik_ERBBQ). The value of WEBSITE_RUN_FROM_PACKAGE app setting is a SAS token that expires exactly one year after deployment!

    image

    image

  2. The SAS token is generated by the Azure pipeline deployment task AzureFunctionApp@1 which I use to deploy my app.

    image

  3. And here is the line of code to blame (LINK). So I guess, this bug really belongs in that github repo(?)

    image

Investigative information

image

Related information

Other investigation notes

  1. I did look through the MSDN documentation for the Azure functions runtime is unreachable error (LINK). But nothing conclusive stood out.
  2. Also rotated the storage account key/connectionString used in the AzureWebJobsStorage app configuration setting. It didn't help.
  3. The following github issues might be related (but I'm not 100% sure).
mithunshanbhag commented 1 year ago

A quick few thoughts/notes/observations: I think that an entire class of errors can be pre-empted if the diagnose and solve problems wizard and the configurations blade started flagging invalid settings (including expired SAS urls). Similar to how invalid key-vault references are flagged.

image

Also, I found this documentation page very useful. But, IIRC, I got to that page by doing a google/bing search for the Azure functions runtime is unreachable error. Wish the portal could somehow have linked me to that page.

zachbugay commented 1 year ago

We have also experienced this same issue.

TroyWitthoeft commented 1 year ago

Same. Also experienced this in the past. Opened a ticket too. Relevant prior issue = https://github.com/microsoft/azure-pipelines-tasks/issues/14837

We run a fleet of Azure functions and this is known as the dreaded "birthday bug" Likes to pop up on holiday weekends. 😉

Surprising this isn't addressed anywhere inside of the Azure tooling.
We've resigned to creating a function app that scans our Azure instance for expiring SAS tokens.

fabiocav commented 1 year ago

Thank you @mithunshanbhag for the super detailed issue!

I've created an issue to track an improvement to ensure a warning is emitted in #9358

For your scenario, I'd also recommend configuring the app to use a managed identity, as described here , as that would avoid the issue with the expiration (@TroyWitthoeft , this should help address the scanning need you've mentioned above).

Also following up with tooling and deployment teams to identify additional enhancements we can make here.

Thanks!

javast97 commented 1 year ago

Thanks man, this solved my day.

Works like a charm!

DerChrisser commented 1 year ago

finally found this threat, solved my issues, thanks! @mithunshanbhag

TroyWitthoeft commented 1 year ago

Thank you @mithunshanbhag for the super detailed issue!

I've created an issue to track an improvement to ensure a warning is emitted in #9358

For your scenario, I'd also recommend configuring the app to use a managed identity, as described here , as that would avoid the issue with the expiration (@TroyWitthoeft , this should help address the scanning need you've mentioned above).

Also following up with tooling and deployment teams to identify additional enhancements we can make here.

Thanks!

Awesome! It's a the little things ... I know a warning will help reduce the troubleshooting time. Thank you.

@fabiocav - With regards to your suggestion for using a managed identity, the link you posted as an example comes right back here? Mislink? I get the impression that using a function app's msi instead of SAS token is a possible mitigation?

TroyWitthoeft commented 11 months ago

Hey! I just got a warning on one of our function apps! 🎉 Nice!

image

It's a bit early, but the functionality is there!

cleferman commented 8 months ago

Just stumbled upon this issue today with no errors whatsoever. It was happening on an Azure Function app which celebrated its first anniversary since the last deployment on the 25th of January 2023 (no one noticed it until now because it's only used at the end of the month). Symptoms: Both QA an PROD environments were returning a 503 on all http endpoints.

No "Azure Functions runtime unreachable" message. No SAS token expired message or set to expire like @TroyWitthoeft has (even after redeployment). The Diagnostic tab was unhelpful also. It just told me "Hey we noticed your function was down for the last 15 minutes", yeah no shit.

Resolution: Just redeploy the function app and it will start working again.

Any idea why the WEBSITE_RUN_FROM_PACKAGE is a sas token url for azure functions but for app services it's just "1"? There HAS to be a better way of handling this. Who would expect their app to stop working after a year with no warning?

Ronnehag commented 3 weeks ago

Any idea why the WEBSITE_RUN_FROM_PACKAGE is a sas token url for azure functions but for app services it's just "1"? There HAS to be a better way of handling this. Who would expect their app to stop working after a year with no warning?

Azure functions runs from a storage account which usually requires a SAS-token or managed identity to access.

For an app service you run the zip-file from wwwroot. But they also support running from a storage account. You can read this in their documentation. I suggest switching over to use managed identity instead, your other option is to re-deploy before the token expires.

There's also a difference between Linux and Windows which you can read here.

You must maintain any SAS URLs used for deployment. When an SAS expires, the package can no longer be deployed. In this case, you must generate a new SAS and update the setting in your function app. You can eliminate this management burden by using a managed identity.