Open am11 opened 4 years ago
I'd like to compare the behavior of this same code running on Linux (under gdb). Unfortunately, my (native) build on Linux is not completing. See: https://gist.github.com/gwr/3520dfbf14190e9225e8214f434ca38e/raw/LinuxBuild01.txt Can anyone suggest what's going wrong with that build? Thanks!
File not found: '/g/ws/dotnet/runtime/THIRD-PARTY-NOTICES.TXT'. [/g/ws/dotnet/runtime/src/coreclr/.nuget/Microsoft.NETCore.ILAsm/Microsoft.NETCore.ILAsm.pkgproj]
That file definitely exists, right? It's the intermittent issue with nuget https://github.com/NuGet/Home/issues/13572 (too many inodes). Just rebuilt the packs
subset ./build.sh packs -c Debug -gcc --keepnativesymbols true
a few times until it builds the .tar.gz we are interested in. 😅
Yeah, that doesn't seem to be working for me. It keeps failing the nuget steps. Anything else I can do to try to work-around that? on this Linux VM?
You can directly use corerun (an internal test host) instead of dotnet.
$ cd runtime
$ src/tests/build.sh -generatelayoutonly -p:LibrariesConfiguration=Debug
then:
$ gdb --args artifacts/tests/coreclr/linux.x64.Debug/Tests/Core_Root/corerun \
../helloworld/bin/Debug/net9.0/helloworld.dll
@gwr, sometimes we also have stray dotnet processes, killing them helps. pkill -KILL dotnet
(to reclaim the inodes and other resources)
Thanks. the test/coreclr thing did what I needed. With that and comparing behaviors, I believe I have a good fix to get rid of the need for the DOTNET_GCHeapHardLimit
override. Pushed to:
https://github.com/dotnet/runtime/compare/main...gwr:dotnet-runtime:illumos1
Now that helloworld is working OK, can you please remind me what test and debug steps to take next? eg. on System.Diagnostics.Process? For starters, after I build, I don't see an illumos dll in the artifacts. Help, @am11 ? Are you on matrix.org by any chance? (element IRC)
@gwr https://github.com/dotnet/runtime/issues/34944#issuecomment-2197520665 has a rough sketch.
Unless illumos and solaris differ, we can keep it under sunos
rather than separate. Replace src/libraries/System.Diagnostics.Process/src/System.Diagnostics.Process.csproj
with https://gist.github.com/am11/4b943df8712c6ce257a22b3aafad29f7. Basically I made a copy of freebsd
lines with sunos
. Of course you will need to create those files physically as well for the project build to succeed. :)
I could still use some pointers on how to attempt a build of these libs:
src/libraries/System.Diagnostics.Process/src/System.Diagnostics.Process.csproj
src/libraries/System.IO.FileSystem.Watcher/src/System.IO.FileSystem.Watcher.csproj
src/libraries/System.Net.Security/src/System.Net.Security.csproj
(@am11?) Thanks!
@gwr, my previous comment has the starting point. The prereq is to understand what other platform implementations are doing to determine which features are needed and which stack is suitable. You may find feature disparity across platforms in few cases, so this work also requires understanding what is not possibly implemented in terms of public facing APIs and marking those API with attributes like [UnsupportedOSPlatformGuard("illumos"), UnsupportedOSPlatformGuard("solaris")]
.
I've done some C# and can look at and understand what the other platforms are doing.
However, when I try to buidl System.Diagnostics.Process
nothing even appears to attempt building anything for illumos. I guess maybe there's some configuration stuff (cmake?) that needs to change?
Here's what I see:
gwr@ubuntu18:/g/ws/dotnet/runtime$ ./dotnet.sh build -p:TargetOS=illumos src/libraries/System.Diagnostics.Process/src
Determining projects to restore...
All projects are up-to-date for restore.
ILLink.RoslynAnalyzer -> /g/ws/dotnet/runtime/artifacts/bin/ILLink.RoslynAnalyzer/Debug/netstandard2.0/ILLink.RoslynAnalyzer.dll
ILLink.CodeFixProvider -> /g/ws/dotnet/runtime/artifacts/bin/ILLink.CodeFixProvider/Debug/netstandard2.0/ILLink.CodeFixProvider.dll
ILCompiler.DependencyAnalysisFramework -> /g/ws/dotnet/runtime/artifacts/bin/ILCompiler.DependencyAnalysisFramework/Debug/ILCompiler.DependencyAnalysisFramework.dll
Mono.Linker -> /g/ws/dotnet/runtime/artifacts/bin/Mono.Linker/ref/Debug/net9.0/illink.dll
Mono.Linker -> /g/ws/dotnet/runtime/artifacts/bin/Mono.Linker/Debug/net9.0/illink.dll
ILLink.Tasks -> /g/ws/dotnet/runtime/artifacts/bin/ILLink.Tasks/Debug/net9.0/ILLink.Tasks.dll
Microsoft.Interop.SourceGeneration -> /g/ws/dotnet/runtime/artifacts/bin/Microsoft.Interop.SourceGeneration/Debug/netstandard2.0/Microsoft.Interop.SourceGeneration.dll
LibraryImportGenerator -> /g/ws/dotnet/runtime/artifacts/bin/LibraryImportGenerator/Debug/netstandard2.0/Microsoft.Interop.LibraryImportGenerator.dll
ComInterfaceGenerator -> /g/ws/dotnet/runtime/artifacts/bin/ComInterfaceGenerator/Debug/netstandard2.0/Microsoft.Interop.ComInterfaceGenerator.dll
ILLink.RoslynAnalyzer -> /g/ws/dotnet/runtime/artifacts/bin/ILLink.RoslynAnalyzer/Debug/netstandard2.0/ILLink.RoslynAnalyzer.dll
System.Runtime -> /g/ws/dotnet/runtime/artifacts/bin/System.Runtime/ref/Debug/net9.0/System.Runtime.dll
System.ComponentModel -> /g/ws/dotnet/runtime/artifacts/bin/System.ComponentModel/ref/Debug/net9.0/System.ComponentModel.dll
System.Diagnostics.FileVersionInfo -> /g/ws/dotnet/runtime/artifacts/bin/System.Diagnostics.FileVersionInfo/ref/Debug/net9.0/System.Diagnostics.FileVersionInfo.dll
System.Collections -> /g/ws/dotnet/runtime/artifacts/bin/System.Collections/ref/Debug/net9.0/System.Collections.dll
System.Collections.NonGeneric -> /g/ws/dotnet/runtime/artifacts/bin/System.Collections.NonGeneric/ref/Debug/net9.0/System.Collections.NonGeneric.dll
System.ObjectModel -> /g/ws/dotnet/runtime/artifacts/bin/System.ObjectModel/ref/Debug/net9.0/System.ObjectModel.dll
System.Runtime.InteropServices -> /g/ws/dotnet/runtime/artifacts/bin/System.Runtime.InteropServices/ref/Debug/net9.0/System.Runtime.InteropServices.dll
System.ComponentModel.Primitives -> /g/ws/dotnet/runtime/artifacts/bin/System.ComponentModel.Primitives/ref/Debug/net9.0/System.ComponentModel.Primitives.dll
System.Collections.Specialized -> /g/ws/dotnet/runtime/artifacts/bin/System.Collections.Specialized/ref/Debug/net9.0/System.Collections.Specialized.dll
System.Diagnostics.Process -> /g/ws/dotnet/runtime/artifacts/bin/System.Diagnostics.Process/ref/Debug/net9.0/System.Diagnostics.Process.dll
System.Diagnostics.Process -> /g/ws/dotnet/runtime/artifacts/bin/System.Diagnostics.Process/Debug/net9.0-ios/System.Diagnostics.Process.dll
System.Diagnostics.Process -> /g/ws/dotnet/runtime/artifacts/bin/System.Diagnostics.Process/Debug/net9.0-maccatalyst/System.Diagnostics.Process.dll
System.Diagnostics.Process -> /g/ws/dotnet/runtime/artifacts/bin/System.Diagnostics.Process/Debug/net9.0-windows/System.Diagnostics.Process.dll
System.Diagnostics.Process -> /g/ws/dotnet/runtime/artifacts/bin/System.Diagnostics.Process/Debug/net9.0-linux/System.Diagnostics.Process.dll
System.Diagnostics.Process -> /g/ws/dotnet/runtime/artifacts/bin/System.Diagnostics.Process/Debug/net9.0-tvos/System.Diagnostics.Process.dll
System.Diagnostics.Process -> /g/ws/dotnet/runtime/artifacts/bin/System.Diagnostics.Process/Debug/net9.0-osx/System.Diagnostics.Process.dll
System.Diagnostics.Process -> /g/ws/dotnet/runtime/artifacts/bin/System.Diagnostics.Process/Debug/net9.0-freebsd/System.Diagnostics.Process.dll
System.Diagnostics.Process -> /g/ws/dotnet/runtime/artifacts/bin/System.Diagnostics.Process/Debug/net9.0/System.Diagnostics.Process.dll
Build succeeded.
0 Warning(s)
0 Error(s)
Time Elapsed 00:01:04.07
gwr@ubuntu18:/g/ws/dotnet/runtime$
Note there's no "illumos" in any of that. I want to make it at least try to build for ilumos. What am I missing? Thanks again!
@gwr #34944 (comment) has a rough sketch.
Oh. Missed this. Thanks.
OK, I'm not much familiar with .csproj files. Thanks for the help with that.
Is there any guidance on the layout of things under:
src/libraries/Common/src/Interop/
Eg. `Linux/System.Native.vs
Linux/*.cs` and others.
What are good tests for these libraries etc? Instructions?
Oh yeah: Are these libraries necessary for self-hosting? (native build) My work would be easier once I can build native.
Thanks.
Is there any guidance on the layout of things under: src/libraries/Common/src/Interop/
its C code lives here: https://github.com/dotnet/runtime/blob/4ef65f869207154a4ad6a513bad798f8a96b7f61/src/native/libs/System.Native/pal_io.c#L1823
Linux procfs is a bit "special" (src/libraries/Common/src/Interop/Linux/procfs
) because those are text files and we read them directly from C# without interop with C. illumos procfs is binary based, therefore we need the regular interop.
Oh yeah: Are these libraries necessary for self-hosting? (native build)
Yes; they are necessary to complete the shared framework (sfx), here is why:
System.Diagnostics.Process
(process spawning; execve etc.) is necessary for .NET SDK, msbuild, vstest etc. vstest folks at some point were discussing about providing in-process execution but not sure the status. As it stands, it's a hard requirement to do any meaning building of .NET projects.System.Net.Security
is necessary for TLS based communication.System.IO.FileSystem.Watcher
is something which shouldn't be necessary in principle, but I have seen in past that aspnetcore (webapps) sometimes make it a requirement during build, not sure about the current status. So it can be done after the first two.OK, some progress here. Any test and debug tips? https://github.com/dotnet/runtime/compare/main...gwr:dotnet-runtime:illumos2
Testing is a bit tricky, since the test executor itself can spawn a child process and fail due to the classic chicken-egg situation (we are porting the System.Diagnostics.Process which implements process spawning). You can give it a try.
On linux:
$ ./dotnet.sh build -p:TargetOS=illumos -p:CrossBuild=true src/libraries/System.Diagnostics.Process/tests
Then copy artifacts/bin/System.Diagnostics.Process.Tests/Debug/net9.0-unix
to illumos machine, say ~/projects/runtime-tests/System.Diagnostics.Process.Tests
. To run:
DOTNET_REMOTEEXECUTOR_SUPPORTED=0 dotnet \
~/projects/runtime-tests/System.Diagnostics.Process.Tests/Debug/net9.0-unix/xunit.console.dll \
~/projects/runtime-tests/System.Diagnostics.Process.Tests/Debug/net9.0-unix/System.Diagnostics.Process.Tests.dll \
-notrait category=nonillumostests -notrait category=nonsolaristests \
-notrait category=OuterLoop -notrait category=failing
If this complains about targetframework 9.0.0-preview... etc. replace it in xunit.console.runtimeconfig.json and System.Diagnostics.Process.Tests.runtimeconfig.json (as we did in helloworld.runtimeconfig.json earlier).
Once ball starts rolling, you can look at [PlatformSpecific(TestPlatforms.Linux)]
etc. which may be applicable on illumos, e.g.
https://github.com/dotnet/runtime/blob/1fe7d189db4a49bc676ddb206456709e089c2293/src/libraries/System.Diagnostics.Process/tests/ProcessTests.cs#L1667 to include the platform (TestPlatforms.illumos
and TestPlatforms.Solaris
are the supported enum values). Similarly, the skip platform condition looks like:
https://github.com/dotnet/runtime/blob/1fe7d189db4a49bc676ddb206456709e089c2293/src/libraries/System.Diagnostics.Process/tests/ProcessTests.cs#L605
Thanks. I'm debugging. Is there a way to ask dotnet to pause during (or shortly after) initialization so I can attach to the process with gdb? It's difficult to get the environment and all the args setup if I let gdb actually try to start the program. I think I saw a pause for debug attach somewhere...
Eg. maybe like #2456 proposes?
Thanks
For managed (C#) code. It requires a few things.
src/native/external/libunwind
),For native (C/C++/assembly) runtime code debugging, just set a breakpoint and continue or use something like while (true) { if (ptrace(PTRACE_TRACEME, 0, nullptr, nullptr) == -1) break; }
I'd use the poor man's printf-debugging technique (using Console.WriteLine("I'm here!");
etc. in C# and printf in C/C++) for now to get the base set of libraries ported.
Linux procfs is a bit "special" (
src/libraries/Common/src/Interop/Linux/procfs
) because those are text files and we read them directly from C# without interop with C. illumos procfs is binary based, therefore we need the regular interop.
BTW, SunOS and illumos have the same style of /proc/pid/* that Linux has. We should be able to do similarly as the Linux code if we want.
Last I checked it has a binary interface unlike linux, i.e. you can do stuff like cat /proc/$$/meminfo
on linux but can't cat /proc/$$/psinfo
on illumos where it requires reading with structs.
Last I checked it has a binary interface unlike linux, i.e. you can do stuff like
cat /proc/$$/meminfo
on linux but can'tcat /proc/$$/psinfo
on illumos where it [requires reading with structs]
Ah right. Yeah, the content that flows over those file descriptors is not human readable. (and on the plus side, does not require any text parsing:)
Yup, note that interop layer also incurs some cost (it adds additional thunks / frames for marshaling). So reading it as text file in C# on linux with non-allocate-y text parsing is working ok. Also, System.Diagnostics.Process is not performance critical; i.e. end-users are most likely not going to put process spawning on performance-sensitive path in their code (so I believe correctness is more important than perf for this lib).
For managed (C#) code. It requires a few things.
- Already ported: HP libunwind (in-tree copy is at
src/native/external/libunwind
),- gdb is not supported [...] so we need llvm-toolchain or just lldb,
We have most of llvm/clang (current is clang-18). I don't see the "lldb" debugger. I guess that's still todo.
... and libSOS, which has a lldbplugin [...] If llvm-toolchain is ported on illumos [...], we can bring it onboard. It will require some tweaking in rootfs toolchain etc. but it's a nontrivial task.
Hopefully we can stick with gcc for the rootfs toolchain for a while.
For native (C/C++/assembly) runtime code debugging, just set a breakpoint and continue or use something like [... ptrace, sleep, ...]
I've been doing that, but I'm having trouble coming up with a good place to put breaks, eg after all the exec and dll loading happens. Any suggestions where's a good place for a startup breakpoint?
Trying to debug with gdb looks like a lost cause
(gdb) run sdp-test/net9.0-unix/xunit.console.dll \
sdp-test/net9.0-unix/System.Diagnostics.Process.Tests.dll \
-notrait category=nonillumostests \
-notrait category=nonsolaristests \
-notrait category=OuterLoop \
-notrait category=failing
Starting program: /tank/ws/dnt/dotnet sdp-test/net9.0-unix/xunit.console.dll \
sdp-test/net9.0-unix/System.Diagnostics.Process.Tests.dll \
-notrait category=nonillumostests \
-notrait category=nonsolaristests \
-notrait category=OuterLoop \
-notrait category=failing
[Thread debugging using libthread_db enabled]
Thread 2 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
0x00007ffef94b8667 in ?? ()
(gdb) where
#0 0x00007ffef94b8667 in ?? ()
#1 0x0000000000000047 in ?? ()
#2 0x0000000000000001 in ?? ()
#3 0x0000000000000000 in ?? ()
(gdb)
Though if I continue, it does give me a backtrace of the C# code:
Continuing.
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at System.IO.Enumeration.FileSystemEnumerableFactory+<>c__DisplayClass2_0.<UserFiles>b__1(System.IO.Enumeration.FileSystemEntry ByRef)
at System.IO.Enumeration.FileSystemEnumerable`1+DelegateEnumerator[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].ShouldIncludeEntry(System.IO.Enumeration.FileSystemEntry ByRef)
at System.IO.Enumeration.FileSystemEnumerator`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
at System.Collections.Generic.List`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
at System.IO.Directory.GetFiles(System.String, System.String, System.IO.EnumerationOptions)
at System.IO.Directory.GetFiles(System.String, System.String)
at Xunit.ConsoleClient.ConsoleRunner.GetAvailableRunnerReporters()
at Xunit.ConsoleClient.ConsoleRunner.EntryPoint(System.String[])
at Xunit.ConsoleClient.Program.Main(System.String[])
Thread 2 received signal SIGABRT, Aborted.
0x00007fffaf3fb6aa in _lwp_kill () from /lib/64/libc.so.1
Is that all I have to work with here? (until lldb)
The exception stacktrace will show up without gdb as well. The exception is pointing to this method: https://github.com/dotnet/runtime/blob/64efe2654c8455e7591aa07e7e8505064f571fc4/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerableFactory.cs#L114
You can probably repro it with helloworld app using this in Program.cs
EnumerationOptions options = new()
{
IgnoreInaccessible = false,
RecurseSubdirectories = true
};
foreach (var file in Directory.GetFiles("/tmp", "*", options))
{
Console.WriteLine(file);
}
publish helloworld from linux, copy to illumos and run.
The exception stacktrace will show up without gdb as well. The exception is pointing to this method: https://github.com/dotnet/runtime/blob/64efe2654c8455e7591aa07e7e8505064f571fc4/src/libraries/System.Private.CoreLib/src/System/IO/Enumeration/FileSystemEnumerableFactory.cs#L114
You can probably repro it with helloworld app using this in Program.cs
EnumerationOptions options = new() { IgnoreInaccessible = false, RecurseSubdirectories = true }; foreach (var file in Directory.GetFiles("/tmp", "*", options)) { Console.WriteLine(file); }
publish helloworld from linux, copy to illumos and run.
This is pretty much the same repro I wrote for https://github.com/dotnet/runtime/issues/104448 . With that fix, running xunit library tests works.
Sorry for not being more clear that was the problem that PR fixes, I was a bit rushed to get some 4th things.
This is pretty much the same repro I wrote for #104448 . With that fix, running xunit library tests works.
Sorry for not being more clear that was the problem that PR fixes, I was a bit rushed to get some 4th things.
Thanks. I pullled your fixes for #104447 and #104448 to my local working branch. Here's what I get now:
$ DOTNET_REMOTEEXECUTOR_SUPPORTED=0 \
./dotnet sdp-test/net9.0-unix/xunit.console.dll \
sdp-test/net9.0-unix/System.Diagnostics.Process.Tests.dll \
-notrait category=nonillumostests \
-notrait category=nonsolaristests \
-notrait category=OuterLoop \
-notrait category=failing
Fatal error. System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
at System.IO.Enumeration.FileSystemEnumerableFactory+<>c__DisplayClass2_0.<UserFiles>b__1(System.IO.Enumeration.FileSystemEntry ByRef)
at System.IO.Enumeration.FileSystemEnumerable`1+DelegateEnumerator[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].ShouldIncludeEntry(System.IO.Enumeration.FileSystemEntry ByRef)
at System.IO.Enumeration.FileSystemEnumerator`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]].MoveNext()
at System.Collections.Generic.List`1[[System.__Canon, System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
at System.IO.Directory.GetFiles(System.String, System.String, System.IO.EnumerationOptions)
at System.IO.Directory.GetFiles(System.String, System.String)
at Xunit.ConsoleClient.ConsoleRunner.GetAvailableRunnerReporters()
at Xunit.ConsoleClient.ConsoleRunner.EntryPoint(System.String[])
at Xunit.ConsoleClient.Program.Main(System.String[])
.run-test: line 13: 19114: Abort(coredump)
Abort
How do I track those name back to the source code? Are those something my "demangle" command could make sense of the way that works for C++ code?
Would it be useful for us to have a "feature" branch or something? Then I wouldn't have to cherry-pick your fixes out of the PRs, or you mine. :)
The second stacktrace seems to be same as the first one?
The second stacktrace seems to be same as the first one?
Oh. Right. Huh...
I could use some help tracking the flow from (for example) the files changed in #104448 into any temporary objects and then the deliverables I copy onto the target. It looks like the change object (and behavior) is not getting onto my test setup.
For example, the key change is in pal_io.cpp so I looked for that:
cd .../artifacts
$ find . -name 'pal_io.*' -print
./obj/native/net9.0-illumos-Debug-x64/System.Native/CMakeFiles/System.Native-Static.dir/pal_io.c.o.d
./obj/native/net9.0-illumos-Debug-x64/System.Native/CMakeFiles/System.Native-Static.dir/pal_io.c.o
./obj/native/net9.0-illumos-Debug-x64/System.Native/CMakeFiles/System.Native.dir/pal_io.c.o.d
./obj/native/net9.0-illumos-Debug-x64/System.Native/CMakeFiles/System.Native.dir/pal_io.c.o
./obj/coreclr/illumos.x64.Debug/libs-native/System.Native/CMakeFiles/System.Native-Static.dir/pal_io.c.o.d
./obj/coreclr/illumos.x64.Debug/libs-native/System.Native/CMakeFiles/System.Native-Static.dir/pal_io.c.o
So does that land in the dotnet program? or where? Thanks
In this case, it's called libSystem.Native.so
(as it is in src/native/libs/System.Native
which has the CMakeLists.txt file with project(System.Native)
directive), so I'd copy assets from find artifacts/bin -iname 'libSystem.Native*'
onto the VM
artifacts/obj
is intermediate objects directory which participate in building the product binaries that go in artifacts/bin
and later artifacts/packages
.
Separately, (not for each change like this one, but) it's good idea to refresh the environment from time to time to avoid later surprises; rm -rf artifacts
on linux, rebuild clr+libs+packs
subsets, and copy over runtime tar.gz to illumos machine and recreate ~/.dotnet
(a helper script might come handy to automate it).
I could use some help tracking the flow from (for example) the files changed in https://github.com/dotnet/runtime/pull/104448 into any temporary objects and then the deliverables I copy onto the target.
Personally what I've been doing is doing a full ./build.sh clr+libs+packs -cross -os illumos
and then copying over artifacts/packages/Debug/Shipping/dotnet-runtime-9.0.0-dev-illumos-x64.tar.gz
to the target. It's a little slow, but it appears to be reliable.
How do I track those name back to the source code? Are those something my "demangle" command could make sense of the way that works for C++ code?
One thing you can do to get line numbers in these managed backtraces is to copy the symbol files over to the target. They live in .pdb files. So if you put System.Prive.CoreLib.pdb
next to System.Private.CoreLib.dll
, the runtime will automatically add the file paths and line numbers to the backtraces. You can find these PDB files in artifacts/packages/Debug/Shipping/Microsoft.NETCore.App.Runtime.illumos-x64.9.0.0-dev.symbols.nupkg
. This is just a zip file. The structure is a little different the dotnet-runtime-9.0.0-dev-illumos-x64.tar.gz
, but you should be able to figure out how to copy the PDB files next to their corresponding DLL files. (maybe there is a command line option to include these PDB files in the tar.gz file, but I have not checked).
I'm not aware of a standalone demangling program. There is a library for it. The readme describes several types of mangling, so it could be useful:
https://github.com/benaadams/Ben.Demystifier
For the specific example:
System.IO.Enumeration.FileSystemEnumerableFactory+<>c__DisplayClass2_0.<UserFiles>b__1
The +
indicates the start of the a nested class. The <>
at the start of a class name indicates a compiler generated class. In this case DisplayClass
means it is a closure of a lambda method. The name of the method where this lambda was defined is part of the name (UserFiles
). So to put it all together, in the class FileSystemEnumerableFactory
there is a method UserFiles
that declared a lambda function and it is currently executing. So here. (It is worth noting that normally it should not be possible for this method to cause an access violation (aka segv). This indicated memory corruption damaged the managed reference.)
@AustinWise if stacktrace is the same as before then either https://github.com/dotnet/runtime/pull/104448 fix didn't work, or test was done with old binaries. Maybe try running the same xunit.console.dll
command to see if it repros on your box?
Yeah, the xunit.console.dll on the test system (after copying as above) shows old dates. Will try removing the artifacts directory.
@AustinWise if stacktrace is the same as before then either #104448 fix didn't work, or test was done with old binaries. Maybe try running the same
xunit.console.dll
command to see if it repros on your box?
I check the System.Diagnostice.Process tests to see if there was anything different. The runner gets past the test discovery phase without hitting the crash.
For what it's worth, the crash reproduced 100% of the time before my fix and reproduced 0% of the time after the fix. I have tested the fix both on SmartOS and OpenIndiana.
FYI on a gdb problem I'm having: .NET translates SIGFPE
into DivideByZeroException
. I noticed that when I'm attached to a process using GDB, this translation breaks. Something zeros out the siginfo->si_code
that .NET relies upon to classify these signals. I'm not sure if this is a .NET problem, GDB problem, or illumos problem. Since I don't want to deal with that rabbit hole right now, I've hacked in a fix so I can keep using GDB: https://github.com/AustinWise/runtime/commit/f9f5886aac8caaa5254ad5509665bf987125f97b
Cool. Callstack was showing GetAvailableRunnerReporters()
, which runs at the beginning before the tests execution. Hopefully, it will work for @gwr as well after the fresh build.
I noticed a problem with exception handling. .NET translates SIGSEGV into NullReferenceException. The sigsegv_handler
is configured to use an alternate stack with sigaltstack
. This handler does not behave like a normal signal handler: it switches the stack back to the original stack and resumes executing code. It never returns from the signal handler. On Linux this works fine: linux does not keep track of whether or not a signal handler returned after using the alternate stack. illumos however sets a bit called SS_ONSTACK
when dispatching to a signal handler on an alternate stack and clears this bit when the handler returns. Before dispatching a signal, it checks to see if the SS_ONSTACK
bit is set. If it set, the alternate stack is not used.
.NET assumes that the alternate stack is always used for signal handlers. This means when it uses SwitchStackAndExecuteHandler
to switch stacks, it actually just moving up the stack a bit. This causes the siginfo
and siginfo
context parameters passed to the signal handler to be clobbered. Sadness ensures.
Here is a minimal C# reproduction program: https://github.com/AustinWise/CrashRepro/blob/master/csharp/Program.cs . It should print "Did not crash.". On illumos it will either crash with an unhandled AccessViolationException or an unhandled SIGSEGV. There is also a library test that triggers this behavior:
dotnet xunit.console.dll System.Runtime.Tests.dll -method "System.Tests.TupleTests.Equals_GetHashCode"
There is an existing environment variable that is supposed to work around this: DOTNET_EnableAlternateStackCheck=1
. However it appears this check does not work correctly. It checks to see if the point at which execution was interrupted by the signal is on an alternate stack. It should probably check whether the current stack the signal handler is using is the alternate stack. I have a commit that makes IsRunningOnAlternateStack
more accurate and makes the aforementioned test program behave correctly: https://github.com/AustinWise/runtime/commit/6417f82ee3097bdbd8c78d16bd1ae610115fb98f
I'm not sure what the correct fix would be. Not use alternate stacks on illumos? Switch stacks by manipulating the context passed to the signal handler and returning from signal handler?
@gwr
I took a stab at the System.Diagnostic.Process support. The first commit sets up the build system and the function definitions needed. They all still throw PlatformNotSupportedException
exception: https://github.com/AustinWise/runtime/commit/361f64a6abb0d7420c5f4249f7d22a6ad5015670
The second commit is hacky and incomplete. It is enough to get the RemoteExecutor working, which unblocks running a lot of tests: https://github.com/AustinWise/runtime/commit/c48ae3d4e3e350df59d9d41777ce2aaa5474663d Note that some elements of it are copy-pasted from the linux version. While linux uses a text based format and illumos uses a binary format, the general structure is similar.
I suspect I'm going to be busy for the next couple of weeks and won't have time to push this work forward during that time. I achieved my personal goal of getting the System.Runtime.Tests mostly working when run on my branch. The remaining failures look like they are caused by time zone data, but I have not looked into these deeply to confirm:
FYI on a gdb problem I'm having: .NET translates
SIGFPE
intoDivideByZeroException
. I noticed that when I'm attached to a process using GDB, this translation breaks. Something zeros out thesiginfo->si_code
that .NET relies upon to classify these signals. I'm not sure if this is a .NET problem, GDB problem, or illumos problem. Since I don't want to deal with that rabbit hole right now, I've hacked in a fix so I can keep using GDB: AustinWise@f9f5886
I've been doing some work on gdb, and I might like to look at this too. Is there any small reproduction environment available for looking at what gdb is doing with this?
FYI on a gdb problem I'm having: .NET translates
SIGFPE
intoDivideByZeroException
. I noticed that when I'm attached to a process using GDB, this translation breaks. Something zeros out thesiginfo->si_code
that .NET relies upon to classify these signals. I'm not sure if this is a .NET problem, GDB problem, or illumos problem. Since I don't want to deal with that rabbit hole right now, I've hacked in a fix so I can keep using GDB: AustinWise@f9f5886I've been doing some work on gdb, and I might like to look at this too. Is there any small reproduction environment available for looking at what gdb is doing with this?
Here is a minimal C# program that reproduces the problem, reduced from this System.Runtime.Tests case:
using System;
using System.Runtime.CompilerServices;
try
{
Console.WriteLine(TestDiv(1, 0));
}
catch (DivideByZeroException)
{
Console.WriteLine("PASS");
}
[MethodImpl(MethodImplOptions.NoInlining)]
static long TestDiv(long a, long b)
{
return a / b;
}
It runs fine without GDB attached (prints "PASS"). When GDB is attached, it crashes with this error:
Process terminated. InternalError
at System.Environment.<FailFast>g____PInvoke|11_0(System.Runtime.CompilerServices.StackCrawlMarkHandle, UInt16*, System.Runtime.CompilerServices.ObjectHandleOnStack, UInt16*)
at System.Environment.FailFast(System.Runtime.CompilerServices.StackCrawlMarkHandle, System.String, System.Runtime.CompilerServices.ObjectHandleOnStack, System.String)
at System.Environment.FailFast(System.Threading.StackCrawlMark ByRef, System.String, System.Exception, System.String)
at System.Environment.FailFast(System.String)
at System.Runtime.EH.FallbackFailFast(System.Runtime.RhFailFastReason, System.Object)
at System.Runtime.EH.FailFastViaClasslib(System.Runtime.RhFailFastReason, System.Object, IntPtr)
at System.Runtime.EH.RhThrowHwEx(UInt32, ExInfo ByRef)
at Program.<<Main>$>g__TestDiv|0_0(Int64, Int64)
at Program.<Main>$(System.String[])
This crash is reproducible on both my SmartOS and OpenIndiana systems, which are using GDB 7 and and 14 respectively.
This isn't new. Lets discuss signals issue where it belongs: https://github.com/dotnet/runtime/issues/35362 and keep this tracking issue limited to high-level milestones. When you will run PAL tests, you will find the differences in platform.
OK. Sorry for making this ticket a bit "chatty". If I could have an email for you, I could use that for some of the "how do I..." questions and the like instead of making yet more noise here. My email is in all my commits. Thanks.
@gwr, I only meant to keep this issue as a main tracking one and branch off to separate issues (https://github.com/dotnet/runtime/issues) / discussions https://github.com/dotnet/runtime/discussions for specific concerns. This way we can call for help from other community members and area owners. In the current state of this thread, it is not easy to track each conversation and mentioning someone on issue with area-Meta
would not be effective.
(also, I do not know all the answers, but I can help navigating things -- preferably on GitHub in open forums)
I took a stab at the System.Diagnostic.Process support. The first commit sets up the build system and the function definitions needed. They all still throw
PlatformNotSupportedException
exception: ...
That's interesting. Your "skeleton" looks somewhat like the Linux code. (confirmed below) I was trying to work from the FreeBSD code (and sharing the same BSD parts that Apple and FreeBSD share, eg. the resource control calls should work the same on illumos)
The second commit is hacky and incomplete. It is enough to get the RemoteExecutor working, which unblocks running a lot of tests: ... Note that some elements of it are copy-pasted from the linux version. While linux uses a text based format and illumos uses a binary format, the general structure is similar.
I've built what's on your branch, and can now reproduce your test results. Thanks.
I've made good progress thanks to the help from @AustinWise (thanks again!). No more failures in the System.Diagnostics.Process.Tests
Here's a github compare link for the latest: https://github.com/dotnet/runtime/compare/main...gwr:illumos5
Should I start opening pull requests for all of those changes? Or how best to proceed?
Any guidance on what to work on next among those libraries?
Great progress @gwr! I think you can open a PR for review. Note that maitnainers maybe busy for .NET 9 preview 7 preparations, so it may take a while. @AustinWise and I can take a look.
Also note that there is one illumos fix I ninja'd in https://github.com/dotnet/runtime/pull/105178, which is blocked due to p7 prep (Environment.SunOS file).
Here's the PR for code that runs System.Diagnostics.Process.Tests (skips but no fails) https://github.com/dotnet/runtime/pull/105403
BTW, I tried rebasing on main from Mon. this week and ran into problems downloading stuff. Not sure why, but it didn't seem to have anything to do with my changes.
Note this needs https://github.com/dotnet/runtime/pull/105207 integrated before it's fully functional including exception handling. [ since merged ]
Cut from https://github.com/dotnet/runtime/issues/4173.
Given below is a high-level list of work items for Solaris x86-64 port:
[x] Native configurations (#34756)
[x] CoreCLR native components (#35173)
awaiting next release of libunwind https://github.com/libunwind/libunwind/releases/tag/v1.5-rc1 or higher with changes from https://github.com/libunwind/libunwind/pull/171, for Solaris support.[ ] PAL tests
priocntl(2)
directly for SunOS targets, or wait for https://www.illumos.org/issues/4963.[x] Libraries native components (#34867)
[x] Mono native components (#37560)
[x] Installer native components (#34263)
[ ] MSBuild configurations
[x] CoreCLR managed components (#36266)
[ ] Libraries managed components
[ ] Installer managed components
[x] CoreCLR tests (#37824)
[ ] Libraries tests
[ ] Mono tests
[ ] Installer tests
[ ] Packaging configurations
[x] RID (#37016)
[x] Cross compilation on Linux (dotnet/arcade#5584, dotnet/dotnet-buildtools-prereqs-docker#324, #37753)
[x] SDK (dotnet/sdk#12198)
[ ] CI hook
Shipping
directory), e..g. https://github.com/am11/runtime/releases/tag/5.0.0-dev.1