dotnet / diagnostics

This repository contains the source code for various .NET Core runtime diagnostic tools and documents.
MIT License
1.18k stars 354 forks source link

Managed component fails to build under FreeBSD #2381

Closed Thefrank closed 3 years ago

Thefrank commented 3 years ago

Description

When trying to build the repo under FreeBSD 12.2 + LLVM9/10/11 the managed part fails with Build failed (exit code '139'). the binlog generated if using -ci is incomplete as it appears something terminates during the process. I wouldve attached the .binlog but they are not supported on this form.

I tried both HEAD and tags/v5.0.152202

Configuration

What OS and version, and what distro if applicable?

What is the architecture (x64, x86, ARM, ARM64)?

Do you know whether it is specific to that configuration?

Are you running in any particular type of environment? (e.g. Containers, a cloud scenario, app you are trying to target is a different user)

What's the output of dotnet info

root@dotnet6:~/diagnostics # ./.dotnet/dotnet --info
.NET SDK (reflecting any global.json):
 Version:   6.0.100-preview.2.21155.3
 Commit:    1a9103db2d

Runtime Environment:
 OS Name:     FreeBSD
 OS Version:  12
 OS Platform: FreeBSD
 RID:         freebsd.12-x64
 Base Path:   /root/diagnostics/.dotnet/sdk/6.0.100-preview.2.21155.3/

Host (useful for support):
  Version: 6.0.0-preview.2.21154.6
  Commit:  3eaf1f316b

.NET SDKs installed:
  3.1.111 [/root/diagnostics/.dotnet/sdk]
  5.0.203 [/root/diagnostics/.dotnet/sdk]
  6.0.100-preview.1.21103.13 [/root/diagnostics/.dotnet/sdk]
  6.0.100-preview.2.21155.3 [/root/diagnostics/.dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 3.1.11 [/root/diagnostics/.dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 5.0.6 [/root/diagnostics/.dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 6.0.0-preview.1.21103.6 [/root/diagnostics/.dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 6.0.0-preview.2.21154.6 [/root/diagnostics/.dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 3.1.10 [/root/diagnostics/.dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 5.0.6 [/root/diagnostics/.dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 6.0.0-preview.1.21102.12 [/root/diagnostics/.dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 6.0.0-preview.2.21154.6 [/root/diagnostics/.dotnet/shared/Microsoft.NETCore.App]

To install additional .NET runtimes or SDKs:
  https://aka.ms/dotnet-download

Regression?

As I am building this for the first time and can not find a copy of SOS or diagnostics for FreeBSD...I am not sure. There is a good amount of native code in place to work around various FreeBSD specific quirks

Other information

This is might be a different issue but: The native section + lldb plugin builds after a large number of workarounds for LLVM9+ but produces a non-working libsos.so due to a missing symbol, objdump/grep'd here:

root@dotnet6:~ # objdump -t diagnostics/artifacts/bin/FreeBSD.x64.Debug/libsos.so | grep _Z12TryGetSymbolmPKcPm
0000000000000000         *UND*  0000000000000000              _Z12TryGetSymbolmPKcPm

Hopefully related to managed not building.

The libsosplugin.so for lldb appears to be correct.

hoyosjs commented 3 years ago

We don't have official support for FreeBSD, so I am not particularly surprised this broke. That being said, this would be good to look at, at least at some point. So the native build works (./build.sh -skipmanaged)? _Z12TryGetSymbolmPKcPm missing happens if we don't build the symbol reader for the platform, and I see we only have an ELF one here https://github.com/dotnet/diagnostics/blob/main/src/SOS/dbgutil/elfreader.cpp.

As for the managed component, we'd need to see where/how it fails. I am not sure how it restores the installer, given that we depend on downloading from an official source and AFAIK we don't build a FreeBSD installer.

mikem8361 commented 3 years ago

I think the missing export is because elfreader.cpp is only compiled into dbgutil for OSX and LINUX build defines which are not set for FreeBSD. The ELF reader wasn't built or tested for FreeBSD. It may not be much work to get it to work but we currently don't have the resources for 6.0 to do this.

Thefrank commented 3 years ago

@hoyosjs / @mikem8361 thanks for getting back to me on this!

native builds (./build.sh -skipmanaged -clang11.0) but I had to make CMake not add in the elfreader as the Link_map that it uses is exclusive to the Linux kernel AFAIK (this covers the specific for the UNIX-like systems: https://github.com/dotnet/runtime/issues/14537#issuecomment-864348910 the rest of the 250+ replies is if you want to go down a 5 year rabbit hole of getting FreeBSD+dotNET working hah)

I was also able to get a limited truss

build.truss.txt

The command used was truss -H -o build.truss ./.dotnet/dotnet msbuild /nologo -logger:/root/.nuget/packages/microsoft.dotnet.arcade.sdk/6.0.0-beta.20515.7/tools/netcoreapp2.1/Microsoft.DotNet.Arcade.Sdk.dll -maxcpucount /m -verbosity:m /v:minimal /bl:/root/diagnostics/artifacts/log/Debug/Build.binlog /clp:Summary /nr:false /p:TreatWarningsAsErrors=true /p:ContinuousIntegrationBuild=true /p:Configuration=Debug /p:RepoRoot=/root/diagnostics /p:Restore=true /p:Build=true /p:Rebuild=false /p:Test=false /p:Pack=false /p:IntegrationTest=false /p:PerformanceTest=false /p:Sign=false /p:Publish=false /warnaserror /root/.nuget/packages/microsoft.dotnet.arcade.sdk/6.0.0-beta.20515.7/tools/Build.proj

-f seem to cause truss to hang :(

and a truss for the build script too

truss -f -H -o script.truss ./build.sh -skipnative --clang11.0

script.truss.txt


I was hoping atleast get something setup so the FreeBSD community can figure out why net6 past preview 2 is segfaulting without resorting to building commits until we what breaks it

I can make a pull request (or two different ones) if you are taking them. 1) update the build's script ability to handle clang > 9 and 2) FreeBSD specific issues to work around CMake quirks when trying to detect sysctl outside of Linux.

hoyosjs commented 3 years ago

Elfreader needs to either be included and fixed or properly stubbed out. Runtime shouldn't need it as we have this guard: https://github.com/dotnet/runtime/blob/d2daf0b96c29d0c3f1070ac93fc351237133bcd0/src/coreclr/debug/daccess/daccess.cpp#L7231 and BSD variants fall under that guard: https://github.com/dotnet/runtime/blob/066894e0b74fc5ecbff95fe37207caa269d5695d/src/coreclr/debug/daccess/CMakeLists.txt#L46. We'd need it if we pretend to support single file tools. Createdump is something that won't compile in the runtime given this, as it uses TryLookupSymbol which is part of the reader. SOS also needs an exclusion of DBGUtil and stubbed out call sites until the capability to read the symbols gets added. As for the managed side, that's a little hard to tell - the easiest was is to skip the managed build and directly build the native SOS.

Thefrank commented 3 years ago

I was able to look into this further and it appears the the segfaulting of the build during the restore process of the managed component is .NET6 related and not specific to this repo.

The managed part builds, after some modifications for it know what FreeBSD is, with no issues after passing -maxcpucount:1 to any branch (or HEAD) that uses .NET6. The branches that use .NET5 build, with (similar) modifications, but without the need for -maxcpucount:1 (and build much much faster).

Before I close this out, where should I recreate the issue? MSBuild? Runtime? Someplace else?

wfurt commented 3 years ago

I had similar experience in the past. It may need to start with msbuild. But since there is nothing really platform specific there the root cause probably lives deep in runtime - probably handling processes or IO on pipes.

hoyosjs commented 3 years ago

If you are able to capture a dump or a call stack at the crash site, that'd probably help a lot. But yeah, a SEGV is likely gonna be a runtime issue.

Thefrank commented 3 years ago

I should have some time tomorrow/early this week. I will try and get some better debugging done on it.

If it is runtime, would fixes make it before the cutoff for net6?

hoyosjs commented 3 years ago

Sorry @Thefrank, didn't see this had a question. Not really, net6.0 has slowed down for a while and it's unlikely any change there would make it to the official 6.0 release.

hoyosjs commented 3 years ago

Closing this as it's a restore issue. If you get any more data on the crash let me know and we can route it accordingly.