Azure / azure-functions-dotnet-worker

Azure Functions out-of-process .NET language worker
MIT License
413 stars 180 forks source link

Intermittent 502 Errors in some regions for .NET 8 on Windows #2075

Closed MikeWhalenIII closed 7 months ago

MikeWhalenIII commented 9 months ago

I just updated my Azure Functions project to .NET 8, and since then, I've been getting intermittent 502 errors with my HTTP-triggered functions.

Watching the log stream, I've discovered that the 502 error always seems to be returned after DrainMode mode enabled pops up.

2023-11-18T04:11:35.832 [Information] DrainMode mode enabled
2023-11-18T04:11:35.832 [Information] Calling StopAsync on the registered listeners
2023-11-18T04:11:35.833 [Information] Stopping the listener 'Microsoft.Azure.WebJobs.Host.Listeners.SingletonListener' for function 'ComputeTopCompaniesStats'
2023-11-18T04:11:35.852 [Information] Stopped the listener 'Microsoft.Azure.WebJobs.Host.Listeners.SingletonListener' for function 'ComputeTopCompaniesStats'
2023-11-18T04:11:35.854 [Information] Call to StopAsync complete, registered listeners are now stopped
2023-11-18T04:11:38.922 [Information] Host Status: {"id": "appName","state": "Running","version": "4.27.5.21549","versionDetails": "4.27.5+096e48198ef4cc3259d9ad0f45891a74992825a6","platformVersion": "100.0.7.517","instanceId": "e09f620ecddc73363f5b6301cd6bcd1b8cf51ffcc5d31d10303d898464685ffb","computerName": "10-30-3-172","processUptime": 332716,"functionAppContentEditingState": "Unknown"}
2023-11-18T04:12:04.315 [Information] Could not resolve CoreCLR path. For more details, enable tracing by setting COREHOST_TRACE environment variable to 1
2023-11-18T04:12:04.325 [Error] Exceeded language worker restart retry count for runtime:dotnet-isolated. Shutting down and proactively recycling the Functions Host to recover
2023-11-18T04:12:09.433 [Information] Stopping JobHost
2023-11-18T04:12:09.434 [Information] Stopping the listener 'Microsoft.Azure.WebJobs.Host.Listeners.SingletonListener' for function 'ComputeTopCompaniesStats'
2023-11-18T04:12:09.454 [Information] Stopped the listener 'Microsoft.Azure.WebJobs.Host.Listeners.SingletonListener' for function 'ComputeTopCompaniesStats'
2023-11-18T04:12:09.457 [Information] Job host stopped

** 502 error when attempting to call an HTTP-triggered function.

2023-11-18T04:12:55.583 [Information] Worker process started and initialized.
2023-11-18T04:13:17.289 [Information] Executing 'Functions.GetTopCompanies' (Reason='This function was programmatically called via the host APIs.', Id=dc4f3180-8ae0-4ffb-b505-5d9ee525842f)

** No more 502 error, and my function returns 200.

I've tried using the diagnosing tool within Azure Functions and I can't seem to figure out the root cause. Here are the two top errors that seem to be happening when the 502.

System.Exception | 91 | Language Worker Process exited. Pid=4704.https://aka.ms/dotnet/app-launch-failed | 11/18/2023 3:08:50 AM

Exceeded language worker restart retry count for runtime:dotnet- | 90 | Exceeded language worker restart retry count for runtime:dotnet-isolated. Shutting down and proactively recycling the Functions Host to recover | 11/18/2023 3:09:00 AM

I am using:

Here are the packages that I am using.

<ItemGroup>
    <FrameworkReference Include="Microsoft.AspNetCore.App" />
    <PackageReference Include="Azure.Identity" Version="1.10.4" />
    <PackageReference Include="Microsoft.AspNetCore.Authentication.JwtBearer" Version="8.0.0" />
    <PackageReference Include="Microsoft.Azure.Cosmos" Version="3.36.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker" Version="1.20.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.CosmosDB" Version="4.4.2" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http" Version="3.1.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Storage.Queues" Version="5.2.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Timer" Version="4.3.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Sdk" Version="1.16.2" />
    <PackageReference Include="Microsoft.Extensions.Http" Version="8.0.0" />
    <PackageReference Include="Microsoft.Graph" Version="5.35.0" />
    <PackageReference Include="Microsoft.Identity.Client" Version="4.57.0" />
    <PackageReference Include="Microsoft.Identity.Web" Version="2.15.3" />
</ItemGroup>

Please let me know if you need any more information. Thank you

kshyju commented 9 months ago

The app-launch-failed error usually happens when your app is set as a 64 bit app and your deployed payload is 32 bit.

Can you make sure you are publishing & deploying win-x64 version of your payload?

dotnet publish -c release -r win-x64 --no-self-contained
MikeWhalenIII commented 9 months ago

Hey @kshyju, thank you for your quick response. So, I am deploying the payload through Visual Studio and confirmed that everything is set to 64-bit. I also tried using 32-bit, and I was having the same issue.

I just downgraded my project and packages back to .NET 7, and everything is working as expected—no more 502 errors.

While looking at the diagnose and solve problems page, I am seeing these exceptions below when I got my last 502 error around 11/18/2023 at 4:41 PM. It seems to happen every time the drain mode is kicked off.

image

recumbented commented 9 months ago

I have same issue. Runtime can recognize the HTTP Trigger function, but frequently fail in invocation.

Azure Functions Consumption Windows Runtime version 4.27.5.21549 Deployed in Japan East region.

csproj.

 <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <AzureFunctionsVersion>V4</AzureFunctionsVersion>
    <OutputType>exe</OutputType>
  </PropertyGroup>
~~~
 <ItemGroup>
    <PackageReference Include="Microsoft.ApplicationInsights.WorkerService" Version="2.21.0" />
    <PackageReference Include="Azure.Storage.Blobs" Version="12.19.1" />
    <PackageReference Include="Azure.Storage.Queues" Version="12.17.1" />
    <PackageReference Include="Azure.Extensions.AspNetCore.Configuration.Secrets" Version="1.3.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker" Version="1.19.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.ApplicationInsights" Version="1.1.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Http" Version="3.1.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Extensions.Timer" Version="4.3.0" />
    <PackageReference Include="Microsoft.Azure.Functions.Worker.Sdk" Version="1.15.1" />
    <PackageReference Include="Microsoft.Extensions.Http" Version="8.0.0" />    
    <PackageReference Include="Azure.Identity" Version="1.10.4" />    
  </ItemGroup>

Ping.cs

public static class Ping
{
    [Function("Ping")]
    public static HttpResponseData Run([HttpTrigger(AuthorizationLevel.Function, "get", Route = null)] HttpRequestData req)
    {
        var response = req.CreateResponse(HttpStatusCode.OK);
        response.WriteString("Pong");
        return response;
    }
}

Azure Functions Log stream (Filesystem Logs)

Connected!
2023-11-19T23:04:34  Welcome, you are now connected to log-streaming service. The default timeout is 2 hours. Change the timeout with the App Setting SCM_LOGSTREAM_TIMEOUT (in seconds).
2023-11-19T23:04:57.171 [Information] Executing 'Functions.Ping' (Reason='This function was programmatically called via the host APIs.', Id=25053997-38e3-48c7-9c96-3e984ed4ca8e)
2023-11-19T23:04:58.279 [Error] Unhandled exception. System.IO.FileLoadException: Could not load file or assembly '[REDACTED]FUNCTIONS, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.
2023-11-19T23:04:58.335 [Error] Exceeded language worker restart retry count for runtime:dotnet-isolated. Shutting down and proactively recycling the Functions Host to recover
2023-11-19T23:05:42.920 [Information] Executing 'Functions.Ping' (Reason='This function was programmatically called via the host APIs.', Id=0f38840c-b474-4293-8aa7-e4f54aa02af2)
2023-11-19T23:05:47.989 [Error] Unhandled exception. System.IO.FileLoadException: Could not load file or assembly '[REDACTED]FUNCTIONS, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.
2023-11-19T23:05:48.060 [Error] Exceeded language worker restart retry count for runtime:dotnet-isolated. Shutting down and proactively recycling the Functions Host to recover

eventlog.xml

<?xml version="1.0" encoding="UTF-8"?>
<Events>
   <Event>
      <System>
         <Provider Name=".NET Runtime" />
         <EventID>1026</EventID>
         <Level>1</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:04:13Z" />
         <EventRecordID>77369906</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-13-238</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application: dotnet.exe
CoreCLR Version: 8.0.23.53103
.NET Version: 8.0.0
Description: The process was terminated due to an unhandled exception.
Exception Info: System.IO.FileLoadException: Could not load file or assembly '[REDACTED]FUNCTIONS, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name="IIS AspNetCore Module V2" />
         <EventID>1005</EventID>
         <Level>2</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:04:13Z" />
         <EventRecordID>365559875</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Failed to gracefully shutdown application 'MACHINE/WEBROOT/APPHOST/MAWSFNPLACEHOLDER465_F_V4_NODE_16_X64'.</Data>
         <Data>Process Id: 3292.</Data>
         <Data>File Version: 16.0.23292.24. Description: IIS ASP.NET Core Module V2 Request Handler. Commit: 2aa401550574f93402eba13ff9a4827ef01a2f3a</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name="IIS AspNetCore Module V2" />
         <EventID>1033</EventID>
         <Level>4</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:04:14Z" />
         <EventRecordID>77371296</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-13-238</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application 'MACHINE/WEBROOT/APPHOST/[REDACTED]FUNCTIONS' has shutdown.</Data>
         <Data>Process Id: 5292.</Data>
         <Data>File Version: 16.0.23292.24. Description: IIS ASP.NET Core Module V2 Request Handler. Commit: 2aa401550574f93402eba13ff9a4827ef01a2f3a</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name=".NET Runtime" />
         <EventID>1026</EventID>
         <Level>1</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:04:47Z" />
         <EventRecordID>365594062</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application: dotnet.exe
CoreCLR Version: 8.0.23.53103
.NET Version: 8.0.0
Description: The process was terminated due to an unhandled exception.
Exception Info: System.IO.FileLoadException: Could not load file or assembly '[REDACTED]FUNCTIONS, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name="IIS AspNetCore Module V2" />
         <EventID>1032</EventID>
         <Level>4</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:04:48Z" />
         <EventRecordID>365594406</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application 'C:\Program Files (x86)\SiteExtensions\Functions\4.27.5\64bit\' started successfully.</Data>
         <Data>Process Id: 5692.</Data>
         <Data>File Version: 16.0.23292.24. Description: IIS ASP.NET Core Module V2 Request Handler. Commit: 2aa401550574f93402eba13ff9a4827ef01a2f3a</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name=".NET Runtime" />
         <EventID>1026</EventID>
         <Level>1</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:04:58Z" />
         <EventRecordID>365604593</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application: dotnet.exe
CoreCLR Version: 8.0.23.53103
.NET Version: 8.0.0
Description: The process was terminated due to an unhandled exception.
Exception Info: System.IO.FileLoadException: Could not load file or assembly '[REDACTED]FUNCTIONS, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name="IIS AspNetCore Module V2" />
         <EventID>1033</EventID>
         <Level>4</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:05:03Z" />
         <EventRecordID>365609828</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application 'MACHINE/WEBROOT/APPHOST/MAWSFNPLACEHOLDER465_F_V4_NODE_16_X64' has shutdown.</Data>
         <Data>Process Id: 5692.</Data>
         <Data>File Version: 16.0.23292.24. Description: IIS ASP.NET Core Module V2 Request Handler. Commit: 2aa401550574f93402eba13ff9a4827ef01a2f3a</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name=".NET Runtime" />
         <EventID>1026</EventID>
         <Level>1</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:05:37Z" />
         <EventRecordID>365643750</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application: dotnet.exe
CoreCLR Version: 8.0.23.53103
.NET Version: 8.0.0
Description: The process was terminated due to an unhandled exception.
Exception Info: System.IO.FileLoadException: Could not load file or assembly '[REDACTED]FUNCTIONS, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name="IIS AspNetCore Module V2" />
         <EventID>1032</EventID>
         <Level>4</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:05:37Z" />
         <EventRecordID>365644093</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application 'C:\Program Files (x86)\SiteExtensions\Functions\4.27.5\64bit\' started successfully.</Data>
         <Data>Process Id: 5536.</Data>
         <Data>File Version: 16.0.23292.24. Description: IIS ASP.NET Core Module V2 Request Handler. Commit: 2aa401550574f93402eba13ff9a4827ef01a2f3a</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name=".NET Runtime" />
         <EventID>1026</EventID>
         <Level>1</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:05:47Z" />
         <EventRecordID>365654296</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application: dotnet.exe
CoreCLR Version: 8.0.23.53103
.NET Version: 8.0.0
Description: The process was terminated due to an unhandled exception.
Exception Info: System.IO.FileLoadException: Could not load file or assembly '[REDACTED]FUNCTIONS, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name="IIS AspNetCore Module V2" />
         <EventID>1005</EventID>
         <Level>2</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:05:58Z" />
         <EventRecordID>365664406</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Failed to gracefully shutdown application 'MACHINE/WEBROOT/APPHOST/MAWSFNPLACEHOLDER465_F_V4_NODE_16_X64'.</Data>
         <Data>Process Id: 5536.</Data>
         <Data>File Version: 16.0.23292.24. Description: IIS ASP.NET Core Module V2 Request Handler. Commit: 2aa401550574f93402eba13ff9a4827ef01a2f3a</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name="IIS AspNetCore Module V2" />
         <EventID>1032</EventID>
         <Level>4</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:06:32Z" />
         <EventRecordID>365698781</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application 'C:\Program Files (x86)\SiteExtensions\Functions\4.27.5\64bit\' started successfully.</Data>
         <Data>Process Id: 7096.</Data>
         <Data>File Version: 16.0.23292.24. Description: IIS ASP.NET Core Module V2 Request Handler. Commit: 2aa401550574f93402eba13ff9a4827ef01a2f3a</Data>
      </EventData>
   </Event>
   <Event>
      <System>
         <Provider Name=".NET Runtime" />
         <EventID>1026</EventID>
         <Level>1</Level>
         <Task>0</Task>
         <Keywords>Keywords</Keywords>
         <TimeCreated SystemTime="2023-11-19T23:06:32Z" />
         <EventRecordID>365698812</EventRecordID>
         <Channel>Application</Channel>
         <Computer>10-30-2-84</Computer>
         <Security />
      </System>
      <EventData>
         <Data>Application: dotnet.exe
CoreCLR Version: 8.0.23.53103
.NET Version: 8.0.0
Description: The process was terminated due to an unhandled exception.
Exception Info: System.IO.FileLoadException: Could not load file or assembly '[REDACTED]FUNCTIONS, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null'.</Data>
      </EventData>
   </Event>
</Events>
rodolfograve commented 9 months ago

I am having the same (or at least similar) issue, except that my host is not starting at all. I can't see as much data in the eventlog.xml file as @recumbented but I can see events with EventID '02000780' which according to err.exe is 'ERROR_CANT_ACCESS_FILE - The file cannot be accessed by the system'.

I suspect this is the issue: EventID 2294 Each application pool has a managedRuntimeVersion attribute that contains the version of the common language runtime (CLR) that the application pool preloads. If the value in the attribute does not match the installed runtime, the worker process may fail to load

I have added more details in #2078.

I'm deploying using exactly the same settings I was using before for .NET 7 isolated. I double checked and it's x64.

The most frustrating bit is that I can't find anything useful in any of the log files, even after setting 'fileLoggingMode' to 'always'.

image

Home page: image

jenspettersson commented 9 months ago

We have the same problem with 502's for a couple of minutes after a new instance is started. We use Consumption Plan on Windows.

I've deployed a sample function app with just one http trigger on both a Consumption Windows and a Consumption Linux and the one using Windows always fails with 502 for a couple of minutes. The Windows version got the following logs:

image

Clearly stating that the dotnet8 runtime isn't installed.

We are in contact with the support and they say this is not a bug, but that dotnet8 is still in "Early Access" and nog readily available on every instance but instead JIT downloaded/installed on startup.

But for us, it's not ok with about 3 minutes of 502's until the runtime is installed when using consumption plan, because we get new instances frequently.

I would say that .NET8 not ready to use for Consumption plan on Windows yet.

MikeWhalenIII commented 9 months ago

We have the same problem with 502's for a couple of minutes after a new instance is started. We use Consumption Plan on Windows.

I've deployed a sample function app with just one http trigger on both a Consumption Windows and a Consumption Linux and the one using Windows always fails with 502 for a couple of minutes. The Windows version got the following logs: image

Clearly stating that the dotnet8 runtime isn't installed.

We are in contact with the support and they say this is not a bug, but that dotnet8 is still in "Early Access" and nog readily available on every instance but instead JIT downloaded/installed on startup.

But for us, it's not ok with about 3 minutes of 502's until the runtime is installed when using consumption plan, because we get new instances frequently.

I would say that .NET8 not ready to use for Consumption plan on Windows yet.

Interesting, this sounds like it is the main issue because I only experienced the 502 errors for a couple of minutes as well, and then magically, everything is working.

fabiocav commented 9 months ago

@MikeWhalenIII can you confirm things are in a good state now?

We'll look at the details so we can root cause here, but it's possible this was caused by a temporary issue during the rollout for some of your workers.

jenspettersson commented 9 months ago

@fabiocav Can not answer for @MikeWhalenIII, but as I mentioned in this thread, we're also experiencing "intermittent" 502:s on Azure Functions running on Windows Consumption Plan. We're still having this issue and was forced to roll back all our .net8 upgrades. Here are some screenshots from a test just made on a sample Azure Function app on both Linux Consumption Plan and Windows Consumption plan. The test is a complete cold start (idle for about 12 hours):

image

The Linux version wakes up and responds in ~4 seconds, but the Windows version had a really hard time this morning... 9 times in a row it responds with a 502 after about 48 seconds each time. Here's a screenshot from the logs during this period (local time is UTC+1):

image

It took around 7 minutes for the instance to start working. This is hardly anything we can use in a production environment.

We thought that the Azure instances would be ready to use by the time .net8 was finally released on November 14th, but that doesn't seem to be the case.

Unless we're missing something... In our support ticket they say that .net8 is only released as an "early access" but we thought that the preview versions and perhaps the RC versions could be thought of as "early access" but not the final version?

Currently, dotNet8 runtime is available as part of the early access program on App Services. This can be considered as GA...

and

...behavior won’t be occurred in the future once dotNet8 final released as GA to the public.

Please correct me if I'm wrong, but isn't .NET 8.0.0 (8.0.100) a "final release to the public"?

image

Sorry about the rant, but we're just confused about what's "final" and what's "eary access" and should not have been updating our services if the November 14th release was just an "early access".

rodolfograve commented 9 months ago

I believe this post means .NET 8 is supposed to be working: https://azure.microsoft.com/en-us/updates/ga-azure-functions-supports-net-8-in-the-isolated-worker-model/

It does seem wrong to say that ".NET 8 is the new recommended default for .NET function apps" only a few lines after "Windows apps might experience additional impact from cold starts".

image

In any case, we are not just experiencing "additional impact from cold starts".

The absence of comments from the team in these tickets tells me they are probably working hard to solve it, but it would be great to get someone to say something here (or anywhere else!)

ChristophHornung commented 9 months ago

I am having the same (or at least similar) issue, except that my host is not starting at all. I can't see as much data in the eventlog.xml file as @recumbented but I can see events with EventID '02000780' which according to err.exe is 'ERROR_CANT_ACCESS_FILE - The file cannot be accessed by the system'.

I suspect this is the issue: EventID 2294 Each application pool has a managedRuntimeVersion attribute that contains the version of the common language runtime (CLR) that the application pool preloads. If the value in the attribute does not match the installed runtime, the worker process may fail to load

Interestingly I am seeing the same error albeit not in functions but in my Azure App Service after trying to update to .net8. They appear one after the other and no real other log seems to be written.

  <Event>
        <System>
            <Provider Name="W3SVC-WP"/>
            <EventID>2294</EventID>
            <Level>1</Level>
            <Task>0</Task>
            <Keywords>Keywords</Keywords>
            <TimeCreated SystemTime="2023-11-21T08:18:27Z"/>
            <EventRecordID>-1668458702</EventRecordID>
            <Channel>Application</Channel>
            <Computer>WN1MDWK0000DV</Computer>
            <Security/>
        </System>
        <EventData>
            <Data>v8.0</Data>
            <Binary>02000780</Binary>
        </EventData>
    </Event>

I reckon since Azure Functions and App Services use similar tooling the source of the problem is the same. I had to roll back to .net7 for now on my App Service. For App Services there seems to be no real bug-reporting anywhere on github so I hope once this is solved the App Service problems will be solved as well.

rodolfograve commented 9 months ago

Yes, @ChristophHornung. As far as I know, Azure Functions runs on App Services, and it's App Services which owns the IIS/w3p process.

Those logs we are seeing come from the IIS service. Fingers crossed for a prompt resolution.

mattchenderson commented 9 months ago

Hi all - yes, we are continuing to work on things here. Thank you for sharing details of what you're seeing!

To provide some additional context, the "early access" framing from App Service and Azure Functions indicates that the runtime is available on Windows, with a strict requirement on having the site's netFrameworkVersion property set properly so that everything can be installed for the app. The system goes through a performance optimization process that gets kicked off right away on GA day, and that's what determines the "early access" period. Basically, it's baking the runtime in as a more universal assumption.

The optimization process is nearing completion, and it should resolve most of the issues that folks are seeing. It is likely that the "early access" tag will linger a bit past that point just due to timing (UX won't be updated right away due to holidays in the US). Regardless, we'll post back here once things are optimized in full.

We're really looking forward to seeing more apps light up on .NET 8. Thanks for your excitement and for engagement with us on this issue!

jenspettersson commented 9 months ago

To provide some additional context, the "early access" framing from App Service and Azure Functions indicates that the runtime is available on Windows, with a strict requirement on having the site's netFrameworkVersion property set properly so that everything can be installed for the app.

Thank you for your reply @mattchenderson, much appreciated. So with the strict requirement on having netFrameworkVersion property set properly, it should work?

image

The tests from this morning:

image

Perhaps not related, but after 8 tries with 502 after ~40+ seconds each, I tried again and at the same time went to the Azure Portal just to check that something wasn't fishy with the function app there. As soon as the portal loaded the app, I got a 200 OK... Probably just a coincidence... 🤷🏼‍♂️

ChristophHornung commented 9 months ago

@mattchenderson Anything I can do to work around the problem in Azure App Service? The Configuration->.net Version is set to .net8, do I need to set the netFrameworkVersion somewhere else?

Tiktack commented 9 months ago

Facing the same issue with Azure Function on Consumption plan on Windows. And in east us 2 region there no more label that it's Early Access. image

rodolfograve commented 9 months ago

UK South region, Consumption Windows plan, still having the same issues.

Is this still expected? It's been 10 days since the official .NET 8 release.

image

image

image

image

image

rodolfograve commented 9 months ago

It could be unrelated, but it sure seems like it is and that things are getting worse. Our production instances (running on an Elastic Plan) have been stopped and can't be restarted now. We have made no changes to it: no settings changed, no deployments, still running on .NET 7.

image

The troubleshooting process sends me to this page: https://learn.microsoft.com/en-gb/dotnet/core/runtime-discovery/troubleshoot-app-launch?pivots=os-windows

image image

Even the Overview in the Portal has errors: image

What's going on?!

mattchenderson commented 9 months ago

Hi - I'm catching up on our deployment state. It remains ongoing, and we're seeing what can be done to get it closed out ASAP. Again, thanks for reporting everything you're seeing.

@jenspettersson That may be a coincidence. But if netFrameworkVersion was set, we would expect it to work, albeit with a higher cold start. Though what you've described seems to exceed the expected impact. The portal loading is similar to a request in terms of warming up the site, so I suspect the test requests at that same time would have had the same result.

@ChristophHornung That is the correct place. That UX maps to the netFrameworkVersion property on the underlying resource definition. Technically that property is on the /sites/<siteName>/config/web object in the Azure Resource Manager.

@Tiktack I'll have to double-check on that UX element. To my knowledge, no UX update has occurred, so it's possible that dropdown wasn't properly updated to add "Early Access" in the first place. We are still in the "Early Access" period, and you should still see "Early Access" if attempting to create a new app, at least.

@rodolfograve Any issues with .NET 8 would unfortunately still be expected. Regardless, an existing .NET 7 should not have been impacted. Please open a support ticket for your production apps if you haven't already.

jenspettersson commented 9 months ago

Hi - I'm catching up on our deployment state. It remains ongoing, and we're seeing what can be done to get it closed out ASAP. Again, thanks for reporting everything you're seeing.

Thanks again for the reply. Our test app in region East US has now been working correctly for a couple of days, around 3-4 second cold start on first request. Perfectly fine.

But the same app on region West Europe still gets several 502's after 40+ sec each, so I guess the ongoing deployment hasn't reached all locations yet.

Out of curiosity, was this an error or was it expected behavior, just not very well documented? Would be good to know on the next major .NET release if we should wait a couple of weeks/months after the release to suggest to our clients to upgrade to latest LTS.

rodolfograve commented 9 months ago

Yes, @jenspettersson. I feel like the real issue here is not that .NET 8 is not yet supported by Azure Functions but that the Azure Functions team said it was: https://azure.microsoft.com/en-us/updates/ga-azure-functions-supports-net-8-in-the-isolated-worker-model/

Setting the right expectations and better communication would have avoided all of us from wasting time on this and would have given more space to the Azure Functions team to finish whatever it is they need to finish.

Thanks for replying, @mattchenderson. I do have a ticket for someone to look into my .NET 7 function. A couple of redeployments fixed it and the support team is expected to provide a root cause analysis.

mattchenderson commented 9 months ago

Just a quick update: the process I outlined earlier is now complete for pubic regions. This should address most of the issues that folks are seeing related to .NET 8 installation. Hopefully, if you try again, you should see things working. Please let us know the results.

Apps still could be seeing issues for other reasons beyond the "Early Access" issue we've been discussing here. Right now, one main one I'd flag is that I'd recommend using version 1.15.1 of the worker SDK while we address some issues that can situationally occur using 1.16.*, for example. Those are more edge case, but if you are still seeing issues, I'd double-check the version there and consider a temporary downgrade.

Again, thanks all for engaging, for providing details of what you're seeing, and for your patience as we've been working through this.

This issue is to remain open so that we can collect any further reports.

rodolfograve commented 9 months ago

Thank you, @mattchenderson. Unfortunately, I'm still experiencing exactly the same issue. I've tried re-deploying more than once and restarting the App from the Portal: no change - the host is still not running, home page says 503 and neither the "Runtime version" nor the list of functions is reported correctly.

image

Also, same 2276 and 2294 entries in the evenlog.xml file: image

Downgrading Microsoft.Azure.Functions.Worker.Sdk to 1.15.1 didn't make any difference:

image

Any ideas?

hamfastgamgee commented 9 months ago

We're seeing something very similar in our set of functions in East US 2. If I do the .NET 8 upgrade steps exactly as recommended, I get the same problem with functions not showing up, sometimes with a null reference exception reported in the Portal console view. We can get the functions to show up by explicitly disabling worker indexing in the .csproj, but they don't actually do anything even configured like that, whether I'm on Worker.Sdk 1.16.2 or 1.15.1. (Explicitly disabling worker indexing might not be necessary on 1.15.1 since it's still off by default in that version; I'm not sure if we ever isolation tested 1.15.1 + no .csproj disabling.)

Everything is fine on .NET 7, and the functions run fine locally in .NET 8.

kshyju commented 9 months ago

@MikeWhalenIII Hey there! Wondering if you're still facing the same issues? If you are, could you let us know and mention the region it's happening in? We're here to help and appreciate your ongoing patience!

Hubert-Rybak commented 9 months ago

Thank you, @mattchenderson. Unfortunately, I'm still experiencing exactly the same issue. I've tried re-deploying more than once and restarting the App from the Portal: no change - the host is still not running, home page says 503 and neither the "Runtime version" nor the list of functions is reported correctly.

image

Also, same 2276 and 2294 entries in the evenlog.xml file: image

Downgrading Microsoft.Azure.Functions.Worker.Sdk to 1.15.1 didn't make any difference:

image

Any ideas?

Exactly the same on East US.

cvium commented 9 months ago

Same issue in Central US

mattchenderson commented 9 months ago

We think we have identified a common cause and are actioning it. I'm posting this before I have an ETA. We know other regions to be impacted beyond those listed in this thread already. Fix will cover all.

jtterry2856 commented 9 months ago

@mattchenderson Any update on this issue? I am experiencing same problems mentioned above. Thanks!

MikeWhalenIII commented 9 months ago

@MikeWhalenIII Hey there! Wondering if you're still facing the same issues? If you are, could you let us know and mention the region it's happening in? We're here to help and appreciate your ongoing patience!

Hey @kshyju, last I checked, I was still having the issue. I will check again once the issue @mattchenderson identified above is resolved. My function app is in East US. Thanks

rodolfograve commented 9 months ago

I'm happy to report that things are working OK for me since yesterday, both on Consumption Plan and Elastic. UK South.

mattchenderson commented 9 months ago

Glad to hear! I unfortunately still see the issue in UK South, but the situation there (and in most regions) has improved. You would be less likely to hit it than at time of my last message, but it is still possible to encounter right now.

GitDaBytes commented 9 months ago

Not sure if related at all, but I have just upgraded our .NET 6 Web App to .NET 8 and deployed to Azure App Services (Central US) and appear to be getting exactly the same issues and errors as stated above. I realize this thread is targeting Azure Functions but any chance there is overlap here? If I set the .NET version to .Net 7 it runs fine, if I set to .NET 8 (early release) I get all the errors and behavior as you are getting above for Functions. I can create a new thread if better separated.

zachgreen commented 9 months ago

Not sure if related at all, but I have just upgraded our .NET 6 Web App to .NET 8 and deployed to Azure App Services (Central US) and appear to be getting exactly the same issues and errors as stated above. I realize this thread is targeting Azure Functions but any chance there is overlap here? If I set the .NET version to .Net 7 it runs fine, if I set to .NET 8 (early release) I get all the errors and behavior as you are getting above for Functions. I can create a new thread if better separated.

I had the same issue and was forced to bin-deploy the .net 8 to get the app to work. I have not been able to go back, and retry without the bin-deploy to see if the issue is still happening.

Hubert-Rybak commented 9 months ago

East US, a little bit better but still host often fails to start randomly. Rolling back

mattchenderson commented 9 months ago

This is indeed expected to impact Web Apps as well. The issue sits at a core layer below Functions, and the fix being tracked addresses all app types.

GitDaBytes commented 9 months ago

This is indeed expected to impact Web Apps as well. The issue sits at a core layer below Functions, and the fix being tracked addresses all app types.

Thanks for confirming.

mattchenderson commented 9 months ago

At time of posting, the following regions still have the potential to present the issue:

Any region not on that list should be good to go already.

Edit: I should qualify this by noting that if you are using an App Service Environment (ASE), there is still a possibility of this issue, depending. The statements about regional support apply to Consumption, Elastic Premium, and non-ASE App Service plans.

GitDaBytes commented 9 months ago

Thanks - do you have an ETA for the other regions? i.e. are they hours, days or weeks away? Thanks!

mattchenderson commented 9 months ago

@GitDaBytes I am limited in the forward-looking statements I am authorized to make. Some regions will certainly close out faster than others. Unfortunately, I do expect Central US to be one of the slower ones, though we are making good progress there, so I could end up wrong on that.

Brazil South should no longer present the issue in question.

GitDaBytes commented 9 months ago

@mattchenderson ok thanks v much. Central US is the one i am after. I have fingers crossed it goes smoothly :)

mattchenderson commented 9 months ago

Central India and UK South should no longer present the issue.

mattchenderson commented 9 months ago

Australia Central 2, Australia Southeast, Germany West Central, Southeast Asia, and Switzerland West should no longer present the issue.

At this point, only East US, Central US, and some ASE setups should be able to run into this.

mattchenderson commented 9 months ago

East US should no longer present the issue.

As a heads-up, our plan is to resolve this issue after Central US completes and folks here have been able to confirm. Updates may still be rolling for App Service Environment (ASE) setups, but for those, a support ticket is going to be the best way to track any ongoing issues.

MikeWhalenIII commented 9 months ago

@mattchenderson I upgraded my function app to .NET 8 nine hours ago, and I have not received any 502 errors! It looks like East US is good. Thank you for all of your help!

GitDaBytes commented 9 months ago

@mattchenderson just checking if there was any update on Central US? Thanks!

mattchenderson commented 8 months ago

@GitDaBytes Apologies for missing this. The update is ongoing but has made great progress. While significantly less likely, at the moment it's still possible to encounter the issue, though.

GitDaBytes commented 8 months ago

@mattchenderson thank you so much for the update! I just tried .net 8 web app on Central US and it does indeed appear to be working. Thank you so much for the hardwork!

mattchenderson commented 8 months ago

Central US should no longer present this issue.

This is now true for all public regions, with the noted exception for ASEs. Again, if anyone encounters this for an ASE, we ask that you open a support ticket, as that will lead to the fastest resolution.

We will now consider this resolved, though we will leave this issue open for a little while (into the beginning of January) so that folks can confirm.

I want to reiterate my sincere thanks for everyone's engagement and patience. These kinds of issues are especially frustrating when they take away from the excitement about new things like .NET 8. But the thread here was super helpful in tracking things down, validating, and celebrating incremental progress. Thank you, and I hope all of you have a wonderful end of year. Here's to a 2024 full of .NET 8!

vexingpos commented 8 months ago

I wasn't sure if this warranted a separate issue but my function app is in East US 2 (using FUNCTIONS_WORKER_RUNTIME = dotnet-isolated + netFrameworkVersion = v8.0 + use32BitWorkerProcess = false) and exhibiting the exact same error exceptions described by the OP in #9686

I managed to get around this by using <FunctionsEnableWorkerIndexing>False</FunctionsEnableWorkerIndexing> in my Functions csproj

Mentioning here as 9686 was closed with a request to follow on here.

Quintinon commented 8 months ago

I'm experiencing this same issue in the Azure Gov US Virginia Region with a consumption plan .NET8 Isolated process. Is there any information about if this being tracked in the USGov area, or should it be fixed already over there too?

I have my azure function set to .NET 8 Isolated. But when I run dotnet --info I see that its on RC2, not the GA version. image

image