dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.06k stars 4.69k forks source link

Building CORE_ROOT is broken #107632

Open BruceForstall opened 3 weeks ago

BruceForstall commented 3 weeks ago

src\tests\build.cmd x64 checked generatelayoutonly fails with

c:\gh\runtime\artifacts\bin\coreclr\windows.x64.Checked\build\Microsoft.NETCore.Native.Publish.targets(77,5): error : The PrivateSdkAssemblies ItemGroup is required for _ComputeAssembliesToCompileToNative [c:\gh\runtime\src\coreclr\tools\aot\crossgen2\crossgen2_publish.csproj]

full log:

c:\gh\runtime>src\tests\build.cmd x64 checked generatelayoutonly
BUILDTEST: Starting Build at 10:19:41.63
BUILDTEST: Commencing CoreCLR test build
BUILDTEST: Common MSBuild args: "/p:TargetOS=windows" "/p:Configuration=Checked" "/p:TargetArchitecture=x64" "/p:RuntimeFlavor=coreclr" /nologo /verbosity:minimal /clp:Summary /maxcpucount
**********************************************************************
** Visual Studio 2022 Developer Command Prompt v17.12.0-pre.3.0
** Copyright (c) 2022 Microsoft Corporation
**********************************************************************
[vcvarsall.bat] Environment initialized for: 'x64'
BUILDTEST: powershell -NoProfile -ExecutionPolicy ByPass -NoLogo -Command "c:\gh\runtime\eng\common\msbuild.ps1"   c:\gh\runtime\src\tests\build.proj -warnAsError:0 /t:TestBuild /nodeReuse:false  /p:RestoreDefaultOptimizationDataPackage=false /p:PortableBuild=true  /p:UsePartialNGENOptimization=false /maxcpucount '/flp:Verbosity=normal;LogFile="c:\gh\runtime\artifacts\log\TestBuild_windows__x64__Checked.log"' '/flp1:WarningsOnly;LogFile="c:\gh\runtime\artifacts\log\TestBuild_windows__x64__Checked.wrn"' '/flp2:ErrorsOnly;LogFile="c:\gh\runtime\artifacts\log\TestBuild_windows__x64__Checked.err"' '/bl:"c:\gh\runtime\artifacts\log\TestBuild_windows__x64__Checked.binlog"'  "/p:TargetOS=windows" "/p:Configuration=Checked" "/p:TargetArchitecture=x64" "/p:RuntimeFlavor=coreclr" /nologo /verbosity:minimal /clp:Summary /maxcpucount

  [10:19:48.56] Restoring all packages...

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\test_dependencies_fs\test_dependencies.fsproj (in 1.69 sec).

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\test_dependencies\test_dependencies.csproj (in 178 ms).

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\CoreCLRTestLibrary\CoreCLRTestLibrary.csproj (in 298 ms).

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\GenerateHWIntrinsicTests\GenerateHWIntrinsicTests_Arm.csproj (in 170 ms).

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\GenerateHWIntrinsicTests\GenerateHWIntrinsicTests_General.csproj (in 165 ms).

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\GenerateHWIntrinsicTests\GenerateHWIntrinsicTests_X86.csproj (in 163 ms).

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\XUnitLogChecker\XUnitLogChecker.csproj (in 247 ms).

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\XUnitWrapperGenerator\XUnitWrapperGenerator.csproj (in 608 ms).

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\XUnitWrapperLibrary\XUnitWrapperLibrary.csproj (in 159 ms).

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\XHarnessRunnerLibrary\XHarnessRunnerLibrary.csproj (in 211 ms).
    1 of 2 projects are up-to-date for restore.

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\external\external.csproj (in 370 ms).

    Determining projects to restore...
    Restored c:\gh\runtime\src\tests\Common\ilasm\ilasm.ilproj (in 199 ms).
  [10:20:29.07] Restoring all packages...Done.
  Building Targeting Pack
  external ->
  Determining projects to restore...
  Restored c:\gh\runtime\src\tools\illink\src\linker\ref\Mono.Linker.csproj (in 220 ms).
  Restored c:\gh\runtime\src\coreclr\tools\aot\crossgen2\crossgen2_publish.csproj (in 220 ms).
  Restored c:\gh\runtime\src\coreclr\tools\aot\ILCompiler.ReadyToRun\ILCompiler.ReadyToRun.csproj (in 227 ms).
  Restored c:\gh\runtime\src\coreclr\tools\aot\ILCompiler.Diagnostics\ILCompiler.Diagnostics.csproj (in 220 ms).
  Restored c:\gh\runtime\src\installer\pkg\sfx\Microsoft.NETCore.App\Microsoft.NETCore.App.Crossgen2.sfxproj (in 220 ms).
  Restored c:\gh\runtime\src\coreclr\tools\aot\ILCompiler.TypeSystem\ILCompiler.TypeSystem.csproj (in 249 ms).
  Restored c:\gh\runtime\src\tools\illink\src\linker\Mono.Linker.csproj (in 249 ms).
  Restored c:\gh\runtime\src\coreclr\tools\aot\ILCompiler.DependencyAnalysisFramework\ILCompiler.DependencyAnalysisFramework.csproj (in 249 ms).
  Restored c:\gh\runtime\src\tools\illink\src\ILLink.RoslynAnalyzer\ILLink.RoslynAnalyzer.csproj (in 290 ms).
  Restored c:\gh\runtime\src\libraries\System.Runtime.InteropServices\gen\DownlevelLibraryImportGenerator\DownlevelLibraryImportGenerator.csproj (in 290 ms).
  Restored c:\gh\runtime\src\tools\illink\src\ILLink.CodeFix\ILLink.CodeFixProvider.csproj (in 295 ms).
  Restored c:\gh\runtime\src\libraries\System.Runtime.InteropServices\gen\Microsoft.Interop.SourceGeneration\Microsoft.Interop.SourceGeneration.csproj (in 297 ms).
  Restored c:\gh\runtime\src\tools\illink\src\ILLink.Tasks\ILLink.Tasks.csproj (in 428 ms).
  ILLink.RoslynAnalyzer -> c:\gh\runtime\artifacts\bin\ILLink.RoslynAnalyzer\x64\Release\netstandard2.0\ILLink.RoslynAnalyzer.dll
  ILLink.CodeFixProvider -> c:\gh\runtime\artifacts\bin\ILLink.CodeFixProvider\x64\Release\netstandard2.0\ILLink.CodeFixProvider.dll
  Mono.Linker -> c:\gh\runtime\artifacts\bin\Mono.Linker\ref\x64\Release\net9.0\illink.dll
  Mono.Linker -> c:\gh\runtime\artifacts\bin\Mono.Linker\x64\Release\net9.0\win-x64\illink.dll
  ILLink.Tasks -> c:\gh\runtime\artifacts\bin\ILLink.Tasks\x64\Release\net9.0\win-x64\ILLink.Tasks.dll
  Mono.Linker -> c:\gh\runtime\artifacts\bin\Mono.Linker\x64\Release\net9.0\illink.dll
  ILLink.Tasks -> c:\gh\runtime\artifacts\bin\ILLink.Tasks\x64\Release\net9.0\ILLink.Tasks.dll
  ILCompiler.DependencyAnalysisFramework -> c:\gh\runtime\artifacts\bin\ILCompiler.DependencyAnalysisFramework\x64\Checked\ILCompiler.DependencyAnalysisFramework.dll
  ILCompiler.TypeSystem -> c:\gh\runtime\artifacts\bin\ILCompiler.TypeSystem\x64\Checked\ILCompiler.TypeSystem.dll
  ILCompiler.Diagnostics -> c:\gh\runtime\artifacts\bin\ILCompiler.Diagnostics\x64\Checked\ILCompiler.Diagnostics.dll
  ILCompiler.ReadyToRun -> c:\gh\runtime\artifacts\bin\ILCompiler.ReadyToRun\x64\Checked\ILCompiler.ReadyToRun.dll
  crossgen2_publish -> c:\gh\runtime\artifacts\bin\crossgen2_publish\x64\Checked\crossgen2.dll
c:\gh\runtime\artifacts\bin\coreclr\windows.x64.Checked\build\Microsoft.NETCore.Native.Publish.targets(77,5): error : The PrivateSdkAssemblies ItemGroup is required for _ComputeAssembliesToCompileToNative [c:\gh\runtime\src\corec
lr\tools\aot\crossgen2\crossgen2_publish.csproj]
  Determining projects to restore...
  All projects are up-to-date for restore.

Build FAILED.

c:\gh\runtime\artifacts\bin\coreclr\windows.x64.Checked\build\Microsoft.NETCore.Native.Publish.targets(77,5): error : The PrivateSdkAssemblies ItemGroup is required for _ComputeAssembliesToCompileToNative [c:\gh\runtime\src\corec
lr\tools\aot\crossgen2\crossgen2_publish.csproj]
    0 Warning(s)
    1 Error(s)

Time Elapsed 00:00:59.19
Build failed with exit code 1. Check errors above.

Note that this is documented by https://github.com/dotnet/runtime/blob/main/docs/workflow/testing/coreclr/testing.md#building-the-core_root

This was possibly broken by https://github.com/dotnet/runtime/pull/106965?

@am11 @jkotas

BruceForstall commented 3 weeks ago

@agocke

BruceForstall commented 3 weeks ago

I see some CI jobs use /p:UsePublishedCrossgen2=false which causes the script to succeed (and maybe does less work, so might be faster, which is good).

am11 commented 3 weeks ago

Those CI legs are separate SPMI related mostly, because they don't require crossgen2. This is passing in CI apparently.

ivdiazsa commented 3 weeks ago

Trying to reproduce it on the latest runtime. Just yesterday, I was able to generate the CORE_ROOT just fine with a few days old clone and it worked, at least on Linux.

agocke commented 3 weeks ago

Tried this locally using the exact instructions at https://github.com/dotnet/runtime/blob/main/docs/workflow/testing/coreclr/testing.md#building-the-core_root, can't repro.

jkotas commented 3 weeks ago

@BruceForstall This error means that your build does not have the live native AOT compiler that is needed by src\tests\build.cmd now. What are the command that you use to build the runtime?

ivdiazsa commented 3 weeks ago

Was able to reproduce, but it seems to be an expected failure. The trigger to see the error is to not build the clr.aot subset, which I assume is a requirement to generate the CORE_ROOT as per @jkotas's comment.

For reference, I used these commands to trigger the error:

./build.sh --subset host.native+clr.runtime+clr.corelib+clr.nativecorelib+clr.tools+clr.iltools+clr.alljits+clr.spmi+libs --configuration Release --runtimeConfiguration Checked

./src/tests/build.sh -x64 -checked -generatelayoutonly
jkotas commented 3 weeks ago

Yes, it is a new requirement after #106965.

There is a separate discussion we can have on whether it is a good idea to depend on live aot toolchain or whether it would be better to depend on the aot toolchain from the SDK (https://github.com/dotnet/runtime/pull/106965#issuecomment-2315493337 and following comments).

BruceForstall commented 3 weeks ago

Ok, so there are two solutions:

  1. If building using subsets, include the clr.aot subset.
  2. invoke src\tests\build.cmd generatelayoutonly with /p:UsePublishedCrossgen2=false (I don't understand the implications, but it also makes generatelayoutonly much faster)

Another solution might be using what Jan suggests above.

VSadov commented 1 week ago

I am hitting the same problem.

C:\Program Files\dotnet\sdk\9.0.100-rc.1.24452.12\Microsoft.Common.CurrentVersion.targets(5322,5): error MSB3030: Could not copy the file "E:\dotnet004\runtime\artifacts\obj\coreclr\crossg
en2_publish\windows.x64.Checked\apphost.exe" because it was not found. [E:\dotnet004\runtime\src\coreclr\tools\aot\crossgen2\crossgen2_publish.csproj]

As much as I try to read into the messages above, I am not sure I can see what the correct solution is.

I see that I could use /p:UsePublishedCrossgen2=false, with unknown implications. Or I could do . . . what ?

For reference. The way that no longer works is:

build.cmd clr+libs+libs.tests -rc checked -lc release & src\tests\build.cmd checked

I've tried also doing

build.cmd clr.aot -rc checked -lc release

it does not seem to help.

ivdiazsa commented 1 week ago

I am hitting the same problem.

C:\Program Files\dotnet\sdk\9.0.100-rc.1.24452.12\Microsoft.Common.CurrentVersion.targets(5322,5): error MSB3030: Could not copy the file "E:\dotnet004\runtime\artifacts\obj\coreclr\crossg
en2_publish\windows.x64.Checked\apphost.exe" because it was not found. [E:\dotnet004\runtime\src\coreclr\tools\aot\crossgen2\crossgen2_publish.csproj]

Ok something broke recently. I'm now getting that apphost issue even on Linux. Will get to look into it as it also reproduced on a fresh clone of today's upstream.

VSadov commented 1 week ago

CI machines are relatively clean and have particular configurations. It is likely that some difference on my machine is causing this - but what?

Regular git clean -xdf, delete global nuget cache, update VS, reboot..., generally help when something got stuck.

I am now adding /p:UsePublishedCrossgen2=false to everything, but not sure it is a good thing or, if it is the only way, why this is not the default.

ivdiazsa commented 1 week ago

I've found the problem. It is a combination of factors.

This code snippet looks for the apphost in the HostConfiguration path. Building the product like we usually do (build.sh -s clr+libs -rc Checked -lc Release) ends up producing a Debug apphost. This is totally unintuitive, since we built the runtime in Checked and the libraries in Release. Moreover, we never mentioned the host, and we have a host subset as well, so it is not intuitive that it will be built in the first place.

With this context in mind, we can take a look at the following snippet. It looks for the apphost first in the HostConfiguration path, which is Checked in this case, and if it doesn't exist, it looks in the LibrariesConfiguration, which is Release. But in this case, neither exist because the initial build created it in Debug as I explained in the previous paragraph. And hence, this problem happens.

The workaround is to pass -p:HostConfiguration=Debug for this specific case. The problem is we have to somehow find the host. We could add an additional check at the end where if <_apphostPath> is still empty after looking for it on the paths it currently looks into, then look into the three potential configuration paths:

Or, we can do additional calculations to find which configuration(s) we haven't checked and look into that/those one(s) only instead.

    <PropertyGroup Condition="'$(UsePublishedCrossgen2)' == 'true'">
      <_targetOS>$(TargetOS)</_targetOS>
      <_targetOS Condition="'$(_targetOS)' == 'windows'">win</_targetOS>
      <_apphostPath Condition="Exists('$(ArtifactsBinDir)$(_targetOS)-$(TargetArchitecture).$(HostConfiguration)\corehost\apphost$(ExeSuffix)')">$(ArtifactsBinDir)$(_targetOS)-$(TargetArchitecture).$(HostConfiguration)\corehost\apphost$(ExeSuffix)</_apphostPath>
      <_apphostPath Condition="'$(_apphostPath)' == '' and Exists('$(ArtifactsBinDir)$(_targetOS)-$(TargetArchitecture).$(LibrariesConfiguration)\corehost\apphost$(ExeSuffix)')">$(ArtifactsBinDir)$(_targetOS)-$(TargetArchitecture).$(LibrariesConfiguration)\corehost\apphost$(ExeSuffix)</_apphostPath>
      <_toolsConfiguration Condition="Exists('$(ArtifactsBinDir)ILLink.Tasks\$(ToolsConfiguration)\$(NetCoreAppToolCurrent)\ILLink.Tasks.dll')">$(ToolsConfiguration)</_toolsConfiguration>
      <_toolsConfiguration Condition="Exists('$(ArtifactsBinDir)ILLink.Tasks\$(LibrariesConfiguration)\$(NetCoreAppToolCurrent)\ILLink.Tasks.dll')">$(LibrariesConfiguration)</_toolsConfiguration>
    </PropertyGroup>

    <MakeDir Condition="'$(UsePublishedCrossgen2)' == 'true'" Directories="$(ArtifactsObjDir)coreclr\crossgen2_publish\$(TargetOS).$(TargetArchitecture).$(RuntimeConfiguration)" />

    <Copy
      SourceFiles="$(_apphostPath)"
      DestinationFiles="$(ArtifactsObjDir)coreclr\crossgen2_publish\$(TargetOS).$(TargetArchitecture).$(RuntimeConfiguration)\apphost$(ExeSuffix)"
      Condition="'$(UsePublishedCrossgen2)' == 'true'" />
VSadov commented 1 week ago

Based on the above, I tried the following (after git clean -xdf):

build.cmd clr+libs+libs.tests -c release -rc checked & src\tests\build.cmd checked

It worked for me!

ivdiazsa commented 1 week ago

Based on the above, I tried the following (after git clean -xdf):

build.cmd clr+libs+libs.tests -c release -rc checked & src\tests\build.cmd checked

It worked for me!

Good to hear. I would've expected it to work like this. The -c Release qualifies everything except the runtime since you also passed -rc Checked. Which means the apphost was build in Release and therefore it was found by the clause that searches in LibrariesConfiguration.

We also have to document this while a definite fix is worked on. It is a pitfall very easily fallen into. I'll add that as well alongside the temporary fix.

am11 commented 1 week ago

It's working because that's what CI is using and that is documented. -lc Release -rc Checked without specifying host or tools configuration is a recipient of disaster and unnecessary combination to care for.

ivdiazsa commented 1 week ago

It's working because that's what CI is using and that is documented. -lc Release -rc Checked without specifying host or tools configuration is a recipient of disaster and unnecessary combination to care for.

It's not that unknown since it's not the first time someone falls into it. So, we will document it.

VSadov commented 1 week ago

As another point. I see the same behavior on arm64-linux

CORE_ROOT does not build due to apphost missing if -rc checked -lc release was used. The combination -c release -rc checked does not have the problem

So @ivdiazsa seems right with the theory that host defaults to Debug and thus cannot be found. We also see that this is not Windows-only or x64-only issue.

-c release -rc checked would be acceptable solution for me even as a long term.

VSadov commented 1 week ago

I kind of always assumed that -rc checked -lc release is a shortcut for -c debug -rc checked -lc release i.e. partition config effectively starts with -c debug unless -c somethingElse is specified

ivdiazsa commented 1 week ago

As another point. I see the same behavior on arm64-linux

CORE_ROOT does not build due to apphost missing if -rc checked -lc release was used. The combination -c release -rc checked does not have the problem

So @ivdiazsa seems right with the theory that host defaults to Debug and thus cannot be found. We also see that this is not Windows-only or x64-only issue.

-c release -rc checked would be acceptable solution for me even as a long term.

Yup it's universal. I'll add the fix I proposed and add it to the docs so that it is clear. It is not intuitive that you should also have to use the universal -c when you're already giving all the specific ones for those subsets. Yes, it means -c Debug if you omit it, but why would that have an impact on clr+libs if you're already giving -rc and -lc? It is not intuitive but that's how the build works. Don't worry, I'll document it along with fixing this specific case.

am11 commented 1 week ago

It's not that unknown since it's not the first time someone falls into it. So, we will document it.

If it is important, it should be part of the CI. Conversely, since it is not part of the CI and those 500+ legs in CI do exercise all scenarios which are relevant today, it doesn't seem important.

ivdiazsa commented 1 week ago

It's not that unknown since it's not the first time someone falls into it. So, we will document it.

If it is important, it should be part of the CI. Conversely, since it is not part of the CI and those 500+ legs in CI do exercise all scenarios which are relevant today, it doesn't seem important.

CI costs to run. We can't expect to pay for every case that might happen out there, only the absolute most relevant. This is also confusing hence we will document it appropriately.

VSadov commented 1 week ago

Ideally if some combinations are known to be problematic, it should be an error. I guess it is plausible that someone might want "problematic" combination on purpose, so maybe a warning. At least it should not be defaulted to.

src\tests\build takes a fairly long time and then fails with an error that is a result of something I did 2 steps back in the build. That typically leads to "lets clean everything up carefully and try again", thus more time is wasted...

Documenting might not be as helpful, as some hand-holding on the part of the build script. Many of us, I suspect, do not re-read docs regularly enough to pick up good practice updates.

I.E. it was not my intent to use debug host. For the thing I need to test, host flavor is irrelevant. I just use something that worked before. If the pattern is known to cause troubles, it might be better if it just errored/warned that what I do is not a good idea.

jkotas commented 1 week ago

-lc Release -rc Checked without specifying host or tools configuration is a recipient of disaster and unnecessary combination to care for.

This configuration is documented in number of places and worked just fine up until recently

https://github.com/dotnet/runtime/blob/main/eng/build.ps1#L114 https://github.com/dotnet/runtime/blob/4621bb61cf76ebffc2dd9ae4289844d2c36dd545/docs/workflow/testing/host/testing.md?plain=1#L86

I do not see why it should be unsupported.

am11 commented 1 week ago

@jkotas first one is:

Write-Host "* Build Release libraries and their tests with a Checked runtime for Windows x64, and run the tests." Write-Host ".\build.cmd clr+libs+libs.tests -rc checked -lc release -test"

it's not broken and working as expected.

Second one is:

build.cmd -vs Microsoft.DotNet.CoreSetup -rc Release -lc Release

It is also not broken and working as expected.

The problematic part is; with this, the corehost and tools are built with default (Debug) configuration and we try generating layout for runtime tests.

The larger issue is src/tests/build was never integrated with rest of the centralized build and the defaults are flipped.

i.e. for:

./build.sh clr+libs -rc Checked
./build.sh clr+libs -rc Checked -c Release

the respective test commands should be:

src/tests/build.sh -p:RuntimeConfiguration=Checked
src/tests/build.sh -Release -p:RuntimeConfiguration=Checked

However, since we have the defaults flipped, we are using:

src/tests/build.sh -Checked -p:LibrariesConfiguration=Debug
src/tests/build.sh -Checked -p:LibrariesConfiguration=Release