aws / aws-extensions-for-dotnet-cli

Extensions to the dotnet CLI to simplify the process of building and publishing .NET Core applications to AWS services
Apache License 2.0
369 stars 86 forks source link

.Net Core Package command fails to target linux-x64 #220

Closed sykesbPragmatics closed 2 years ago

sykesbPragmatics commented 2 years ago

Describe the bug

We have had this issue periodically across the several .Net Core lambda projects we use and I have never been able to nail down the root cause. To be clear it occurred previously when we were using .Net Core 3.1 and we had not seen it (at least with local builds) for several months.

All our builds are on windows machines and generally I only see this occur on our AWS CodeBuild Windows 2019 build but today I was able to reproduce this locally on a Windows 10 machine.

Expected Behavior

We expect the publish to produce a zip that is ready for use on AWS Lambda with a runtime target of:.NETCoreApp,Version=v6.0/linux-x64 in the deps.json file

Current Behavior

Instead we end up with a zip file with a runtime target of: .NETCoreApp,Version=v6.0 Which in theory should WORK but always ends up with a runtime error similar to this:

at Amazon.Lambda.AspNetCoreServer.AbstractAspNetCoreFunction`2.ProcessRequest(ILambdaContext lambdaContext, Object context, InvokeFeatures features, Boolean rethrowUnhandledError)
FileNotFoundException:
System.IO.FileNotFoundException: Could not load file or assembly 'System.Data.SqlClient, Version=4.6.1.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'. The system cannot find the file specified.
File name: 'System.Data.SqlClient, Version=4.6.1.1, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'

To be clear the System.Data.SqlClient dll exists in the bin directory and a reference redirect also exists but the app never seems to be able to find it.

Reproduction Steps

All builds/publish are command line with the following parameters:

dotnet lambda package -pl <project> -o <zip location>

I tried to delete the obj directories for all projects which resulted in the exact same zip file content results (comparing via diff tool).

Next I found that the dll itself had a slight version upgrade available (4.8.2 to 4.8.3). After the version upgrade the newly published version worked properly in Lambda. I don't believe that the dll upgrade is actually what solved the problem however.

Comparing the .deps.json file results in these critical differences:

Bad:

 {
  "runtimeTarget": {
    "name": ".NETCoreApp,Version=v6.0",
    "signature": ""
  },
  "compilationOptions": {},
  "targets": {
    ".NETCoreApp,Version=v6.0": {
...........

Good:

{
  "runtimeTarget": {
    "name": ".NETCoreApp,Version=v6.0/linux-x64",
    "signature": ""
  },
  "compilationOptions": {},
  "targets": {
    ".NETCoreApp,Version=v6.0": {},
    ".NETCoreApp,Version=v6.0/linux-x64": {

There are a lot of other items different in the deps.json as well and they all seem to be related to the fact that the "broken" version does not expect to target linux-x64. The odd thing is that the command line result of the "dotnet lambda pacakge" DEFINITELY includes the --runtime command in both the working and failing versions:

C:\Users\xxxxx\source\repos\Aws\PROJECTNAME>dotnet lambda package -pl . -o c:\temp\Project.zip
Amazon Lambda Tools for .NET Core applications (5.4.1)
Project Home: https://github.com/aws/aws-extensions-for-dotnet-cli, https://github.com/aws/aws-lambda-dotnet

Executing publish command
Deleted previous publish folder
... invoking 'dotnet publish', working folder 'C:\Users\xxxxxxx\source\repos\Aws\PROJECTNAME\.\bin\Release\net6.0\publish'
... dotnet publish "C:\Users\xxxxxx\source\repos\Aws\PROJECTNAME\." --output "C:\Users\xxxxxx\source\repos\Aws\PROJECTNAME\.\bin\Release\net6.0\publish" --configuration "Release" --framework "net6.0" /p:GenerateRuntimeConfigurationFiles=true --runtime linux-x64 --self-contained false
... publish: Microsoft (R) Build Engine version 17.2.0+41abc5629 for .NET
... publish: Copyright (C) Microsoft Corporation. All rights reserved.

In the end the thing that I think triggered the "fix" was resolving nuget dependencies. The reason i believe this to be the case is the following sequence of events:

  1. Run publish with dll version 4.8.2 :Results in broken publish
  2. Update dll to 4.8.3. The publish output has extra verbiage about resolving nuget packages for the relevant projects and results in a valid publish
  3. Roll back the dll to 4.8.2 in the projects. Running the publish again results in a VALID publish.

The above implies that the dll version is not the fix, its something to do with the nuget package restore preventing the linux runtime target from being included.

I will also say that currently ALL of our builds in AWS CodeBuild are failing with this problem and its 100% reproducible. Those builds have the lambda projects last with a normal Web Project build first. I am going to try reversing the order of those builds to see if that makes a difference and will update this bug with the results.

Possible Solution

I'm not at all certain that this is not a .Net Core bug as opposed to something in the dotnet lambda command line tool, but I'm starting here as this is directly causing my .Net Lambda projects to fail at random times during publish.

At this time I do not have a solution other than possible order-of-events nuget package restore.

Additional Information/Context

No response

Targeted .NET platform

.Net 6

CLI extension version

Amazon Lambda Tools for .NET Core applications (5.4.1) Microsoft (R) Build Engine version 17.2.0+41abc5629 for .NET

Environment details (OS name and version, etc.)

Windows 10, Windows Server 2019 (AWS CodeDeploy)

ashishdhingra commented 2 years ago

Hi @sykesbPragmatics,

Thanks for reporting the issue. Could you please provide a sample project to reproduce the issue? Also, what is the output of dotnet --version in your environment? You mentioned that for System.Data.SqlClient, upgrading from 4.8.2 to 4.8.3 resolves the issue. Unfortunately, I'm unable to find change log for System.Data.SqlClient, so not sure what changed between these versions.

Regards, Ashish

sykesbPragmatics commented 2 years ago

@ashishdhingra Result of dotnet --version is 6.0.300

If you read down in the details I think the upgrade of System.Data.SqlClient is not the root cause, but causes a nuget package restore that does trigger whatever "fix" is going on. This is backed up by the fact that if i roll back the version the next run with the older version "works".

The error is totally random so i don't know if I can provide a sample project with the issue.

sykesbPragmatics commented 2 years ago

I believe that I have identified the root cause. Our lambdas were targeting other projects and the nuget restore was not properly retrieving the linux-x64 dependencies.

Possible solutions: Explicitly identify the runtime identifiers that are needed for your builds. In our case both win-x64 and linux-64 are needed since we develop on windows and docker runs in linux.

<PropertyGroup>
        <TargetFramework>net6.0</TargetFramework>
              <RuntimeIdentifiers>win-x64;linux-x64</RuntimeIdentifiers>
  </PropertyGroup>

OR Explicitly call the dotnet restore command with the runtime identifier prior to the build

dotnet restore -r linux-x64

In this case even though the runtime command is specified during the publish the projects/restore were not using it properly.

github-actions[bot] commented 2 years ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.