dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.55k stars 4.54k forks source link

Port corehost to QNX7 #33374

Open guesshe opened 4 years ago

guesshe commented 4 years ago

Hi,

I am trying to port the entire runtime to qnx7 platform on x64 arch. I am able to build coreclr but it won't run unless I have dotnet executable built. Any suggestions on how to build corehost for qnx?

jkotas commented 4 years ago

Any suggestions on how to build corehost for qnx?

The same way as coreclr? It lives under https://github.com/dotnet/runtime/tree/master/src/installer/corehost

guesshe commented 4 years ago

How about the .nuget packages downloaded for specific RID? I used this repo https://github.com/dotnet/core-setup/tree/v2.2.8, when I tried on linux, it pulls down some .nuget files for linux platform, but I don't have these files for QNX to pull down.

jkotas commented 4 years ago

You may want to build it from dotnet/runtime repo. dotnet/runtime has everything together that avoids the issues with publishing and downloading packages between repos.

guesshe commented 4 years ago

@jkotas Oh. Thanks! Shall I start with all subprojects or only coreclr and corehost should be enough for me?

jkotas commented 4 years ago

You can start src\coreclr, src\libraries\Native and corehost; and get the managed libraries from other Unix flavor.

guesshe commented 4 years ago

Thanks! By saying managed libraries, do you mean the .dll libraries?

jkotas commented 4 years ago

Right

guesshe commented 4 years ago

@jkotas I tried the dotnet core 5.0.0-dev on linux and it can build a binary dotnet under artifacts directory, but when I tried to execute it, it gave me an error "A fatal error occurred. The folder [/home//Github/runtime/artifacts/obj/linux-x64.Debug/cli/dotnet/host/fxr] does not exist". This is the same error when I tried the v2.2.8 version of ccorehost on linux. If I download the cli tar file and untar it, it has sub-directories host. What did I miss? Is the built dotnet directly executable or I have to do some post-processing?

jkotas commented 4 years ago

obj is directory for intermediate build files. It does not have the right directory layout.

Try the one under bin, e.g. artifacts/bin/testhost/netcoreapp5.0-linux-Debug-x64

guesshe commented 4 years ago

@jkotas Thanks! I will try it out and let you know the progress.

guesshe commented 4 years ago

@jkotas Can I publish my app to netcore sdk 5.0.0-dev? Or the other way around, can I build dotnet/runtime for sdk version 3? Following command will build a dotnet executable but it missed host folder and can't run from there. It doesn't build the artifacts/bin/testhost folder though. /home//Github/runtime/src/installer/corehost/build.sh Debug x64 -apphostver "2.1.802" -hostver "2.1.802" -fxrver "2.1.802" -policyver "2.1.802" -commithash "fc2e56c8e8d60180d9ca6ddff67076d779fd4a43"

jkotas commented 4 years ago

What typically works best for initial bring ups like this is to publish standalone app (e.g. using dotnet publish -r linux-x64) and then overwrite the binaries what what you have built.

guesshe commented 4 years ago

@jkotas Thanks! I tried replace the dotnet executable with my own built version of 5.0.0-dev and it seems working. So I think my next step is to build qnx version of following shared libraries and replace them, am I correct? Do I really need libuv.so and libe_sqlite3.so? They are under AspNet, not NetCore. ./shared/Microsoft.AspNetCore.All/2.2.8/libuv.so ./shared/Microsoft.AspNetCore.All/2.2.8/libe_sqlite3.so ./shared/Microsoft.NETCore.App/2.2.8/libhostpolicy.so ./shared/Microsoft.NETCore.App/2.2.8/System.Native.so ./shared/Microsoft.NETCore.App/2.2.8/libmscordbi.so ./shared/Microsoft.NETCore.App/2.2.8/libmscordaccore.so ./shared/Microsoft.NETCore.App/2.2.8/libcoreclr.so ./shared/Microsoft.NETCore.App/2.2.8/System.IO.Compression.Native.so ./shared/Microsoft.NETCore.App/2.2.8/System.Security.Cryptography.Native.OpenSsl.so ./shared/Microsoft.NETCore.App/2.2.8/libsos.so ./shared/Microsoft.NETCore.App/2.2.8/libcoreclrtraceptprovider.so ./shared/Microsoft.NETCore.App/2.2.8/libsosplugin.so ./shared/Microsoft.NETCore.App/2.2.8/System.Globalization.Native.so ./shared/Microsoft.NETCore.App/2.2.8/libclrjit.so ./shared/Microsoft.NETCore.App/2.2.8/System.Net.Http.Native.so ./shared/Microsoft.NETCore.App/2.2.8/libdbgshim.so ./shared/Microsoft.NETCore.App/2.2.8/System.Net.Security.Native.so ./host/fxr/2.2.8/libhostfxr.so

jkotas commented 4 years ago

Do I really need libuv.so and libe_sqlite3.so?

It depends on the ASP.NET Core you are planning to use, and how you plan to configure it.

am11 commented 4 years ago

libuv is not required for ASP.NET Core (it is an optional provider for KestrelHttpServer, primary backend is .NET's own managed sockets). libe_sqlite3 (which comes from https://github.com/ericsink/SQLitePCL.raw) is required only when EntityFramework Core is used with SQLite provider.

guesshe commented 4 years ago

@am11 @jkotas Thanks!

guesshe commented 4 years ago

Any idea how this shared library is built? ./shared/Microsoft.NETCore.App/2.2.8/System.Net.Http.Native.so, I didn't find it after built src/libraries/Native/build-native.sh

jkotas commented 4 years ago

This library no longer exists in dotnet/runtime repo.

guesshe commented 4 years ago

@jkotas Thanks! I will work on the rest then.

guesshe commented 4 years ago

For the managed libraries (.dll), can I reuse 2.2.8 sdk version? Only replacing .so and .a libraries with my own built version.

jkotas commented 4 years ago

You are likely going to run into mismatches when combining 2.2.8 managed libraries with latest native binaries from dotnet/runtime

guesshe commented 4 years ago

I am able to build corehost but got a ELF error while executing it in a QNX device. I am debugging on why it happened.

guesshe commented 4 years ago

@jkotas Is netcore 5 sdk available to try out?

jkotas commented 4 years ago

Yes, you can download the daily builds at https://github.com/dotnet/core-sdk#installers-and-binaries

guesshe commented 4 years ago

@jkotas Thanks!

guesshe commented 4 years ago

I managed to build the dotnet executable using clang (built for QNX specifically), but when I used ldd to check dependencies, I got following error. ldd: /tmp/dotnet: Exec format error The readelf command showed following required libs and they are all present on the OS. 0x0000000000000001 (NEEDED) Shared library: [libm.so.3] 0x0000000000000001 (NEEDED) Shared library: [libiconv.so.1] 0x0000000000000001 (NEEDED) Shared library: [libc.so.4] Any idea why this error occurred?

am11 commented 4 years ago

Exec format error

sounds like it got built for different architecture. if there is readelf(1) available, maybe try readelf -h $(command -v dotnet) | grep 'Class\|File\|Machine', e.g.:

$ readelf -h .dotnet/dotnet | grep 'Class\|File\|Machine'
  Class:                             ELF64
  Machine:                           Advanced Micro Devices X86-64

also does ldd -v /path/to/dotnet show something interesting?

btw, is there anything like vagrant box or a regular vm available for qnx 7 for developers or is myqnx account mandatory for devs as well?

guesshe commented 4 years ago

@am11 It showed following for both Linux and QNX version of dotnet Class: ELF64 Machine: Advanced Micro Devices X86-64 I would assume this is fine? There is a QNX version of ldd, when I ran it on QNX, it gave me "exec format error". I can't run QNX version of ldd on Linux. As for your question, it is required to have a myqnx account to download sdk and tools for QNX, it is one month free trial license.

am11 commented 4 years ago

@guesshe, thanks, i was hoping for something like openqnx, which seems to also exist, but not sure how similar it is with QNX 7. :) Exec format error from compiled code on same system typically indicates that the compiler/linker has somehow picked up the incompatible toolchain. If you could share the build output with commands that were executed, that might help spotting such issue. Also, here are the ELF headers on Ubuntu (which I think should differ from QNX):

$ dotnet --version
3.1.100

$ readelf -h $(command -v dotnet)
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x408a2b
  Start of program headers:          64 (bytes into file)
  Start of section headers:          103952 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         30
  Section header string table index: 29
am11 commented 4 years ago

Two FYIs, you can also:

(assuming before -subsetCategory installer, -subsetCategory coreclr and -subsetCategory libraries were built using the same compiler; preferably start clean rm -rf artifacts or git clean -xdf)

guesshe commented 4 years ago

@am11 Thanks! I am actually doing a cross-compiling. I will try your suggestions now.

guesshe commented 4 years ago

@am11 /home//Github/runtime/src/installer/corehost/build.sh Debug x64 -apphostver "5.0.0-dev" -hostver "5.0.0-dev" -fxrver "5.0.0-dev" -policyver "5.0.0-dev" -commithash "fc2e56c8e8d60180d9ca6ddff67076d779fd4a43". This is the command I use. I have build coreclr but only for v2.2.8, there are lots of changes I have to make in order to build it using QNX toolchain.

guesshe commented 4 years ago

Here is the link.txt under cmake generated build directory for dotnet executable. /home//PubGitRepo/llvm-project/build/bin/clang++ --target=x86_64-pc-qnx700-gnu -DQNX -DQNXNTO -DX86_64 -DLITTLEENDIAN__ -isystem /home//qnx700/target/qnx7/usr/include -isystem / home//qnx700/target/qnx7/usr/include/c++/v1 -std=c++11 -g -lc++ -lm -stdlib=libstdc++ -l/home//qnx700/target/qnx7/x86_64/usr/lib/libc++.a -l/home//qnx700/target/qnx7/x86_64/usr/lib/libint l.a -l/home//qnx700/target/qnx7/x86_64/usr/lib/libiconv.so -Wl,--build-id=sha1 -Wl,-z,relro,-z,now -fPIE -pie CMakeFiles/dotnet.dir//fxr_resolver.cpp.o CMakeFiles/dotnet.dir///corehost.cpp.o -o dotnet ../hostmisc/libhostmisc.a

am11 commented 4 years ago

@guesshe, the command looks correct for x86-64 target system. Is the target device (where you are getting Exec format error) using the same architecture or is it aarch64?

guesshe commented 4 years ago

@am11 Thanks! I solved my issue by using qcc instead of clang. Maybe clang picked up something that messed up my cross-compiling environment. Now I am facing another issue where #define symlinkEntrypointExecutable "/proc/self/exefile" doesn't exist in QNX. I am working on finding an alternative solution.

am11 commented 4 years ago

symlinkEntrypointExecutable "/proc/self/exefile"

@guesshe, I had this problem when running dotnet in Linux emulator on FreeBSD. Although FreeBSD itself has a syscall for that https://github.com/dotnet/runtime/blob/b0351370ccd132d95c97b75312fc36adaacc2664/src/installer/corehost/cli/hostmisc/pal.unix.cpp#L698-L708 but emulator required mounting procfs to Linux chroot. You may want to try out the same on QNX https://www.qnx.com/developers/docs/6.5.0SP1.update/com.qnx.doc.neutrino_cookbook/s3_procfs.html to overcome this situation.

guesshe commented 4 years ago

@am11 Thanks! I will add a QNX version of it.

guesshe commented 4 years ago

I am able to build and run the dotnet executable. But when I pointed it to my published, self-contained hello_world application, it popped up following error. I suspect this has something to do with how I build coreclr (note my coreclr is still on version 2.2.8). Any idea how to debug this issue from coreclr perspective? unknown symbol: _ZN3ETW5GCLog11FireGcStartEPNS0_14st_GCEventInfoE referenced from libcoreclr.so unknown symbol: __tls_get_addr referenced from libcoreclr.so

am11 commented 4 years ago

@guesshe, perhaps it is due to FEATURE_EVENT_TRACE. We disabled this method when FEATURE_EVENT_TRACE is disabled for Android, just a few days ago: https://github.com/dotnet/runtime/blob/91f14182958b0fad9c9b4dc7d908ff955581979b/src/coreclr/src/vm/gctoclreventsink.cpp. Reason for disabling event tracing feature was that lttng-ust library is not available on Android (at least via Termux package manager, it is not). If QNX is also missing liblttng-ust, you can try building coreclr by disabling this feature:

# perform a full coreclr build (native+managed components)
./build.sh -subsetcategory coreclr -cmakeargs -DFEATURE_EVENT_TRACE=0

# or only native components
./src/coreclr/build-runtime.sh -cmakeargs -DFEATURE_EVENT_TRACE=0

then you will likely overcome the missing _ZN3ETW5GCLog11FireGcStartEPNS0_14st_GCEventInfoE issue. Also, please note that it is best to keep the versions of installer, libraries and coreclr subset categories in sync, i.e. build from same SHA-1 git hash. This will avoid running into API/ABI mismatches or missing symbols issues. I can imagine it is somewhat challenging to keep up with the running master, for that I suggest to distill to a good/known SHA-1 (e.g. from release/5.0-preview2 branch) and make that build (it's more work but worth it since you are in best position to pull it off). :)

guesshe commented 4 years ago

@am11 Thanks! I modified clrfeatures.cmake to have set(FEATURE_EVENT_TRACE 0) if FEATURE_EVENT_TRACE is not defined. Is this the same as you pointed out to disable via cmakeargs?

am11 commented 4 years ago

Yes, it is the same thing (if it is compiling 🙂). For Android, it is disabled very early in the build: https://github.com/dotnet/runtime/blob/25fdaa850f492a9b4144670cac3522bd5b57cd6f/eng/common/cross/toolchain.cmake#L57 (when cmake sets up the toolchain for cross-compilation specified by CMAKE_TOOLCHAIN_FILE in gen-buildsys.sh; this toolchain.cmake script is invoked by cmake before project's first cmake script)

guesshe commented 4 years ago

@am11 Is clang a must to build? Is it possible to use gcc? I had issue with using clang to build dotnet executable, which resulted in "exec format error".

guesshe commented 4 years ago

Can someone please explain to me a bit more about how does dotnet executable load a .dll application? It would be helpful to my debugging. Thanks in advance!

am11 commented 4 years ago

-clang is default when there is no compiler specified, can use -gcc as you have done before: https://github.com/dotnet/runtime/issues/33374#issuecomment-602789459..

guesshe commented 4 years ago

@am11 Thanks! It seems coreclr 2.2.8 doesn't support gcc. Mine build was on 2.2.8. I will bring my changes to net5 and try it from there.

wfurt commented 4 years ago

For the record you can read part of the FreeBSD saga here https://github.com/wfurt/corefx/wiki/Building-.NET-Core--2.x-on-FreeBSD and https://github.com/wfurt/corefx/wiki/Building-.NET-Core-3.x-on-FreeBSD here. (This is clone as the original Wiki got lost with runtime transition)

It outlines different strategies in different maturity stages. In general getting the managed part builded turn out to be bigger challenge. The last effort is captured https://github.com/dotnet/runtime/pull/34000 when we can cross-compile native bits and use rest of the build "normally". When I was looking for clang support I did noticed QNX in the list as well. One more note is that getting changes to master is relatively ok. It is need impossible to get permission for maintenance branches e.g. 2.x and 3.x. So even if you manage to get it working there is no avenue to take that work. Moving to master/5 is the right choice IMHO.

guesshe commented 4 years ago

@wfurt Thanks for sharing this!

guesshe commented 4 years ago

@wfurt @am11 I am almost there completing compiling net5 coreclr but I am facing following issue. My compiler is gcc 5.4.0.

/home//Github/runtime/src/coreclr/src/pal/src/arch/amd64/context2.S: Assembler messages: /home//Github/runtime/src/coreclr/src/pal/src/arch/amd64/context2.S:90: Error: unbalanced parenthesis in operand 1. /home//Github/runtime/src/coreclr/src/pal/src/arch/amd64/context2.S:90: Error: missing ')' /home//Github/runtime/src/coreclr/src/pal/src/arch/amd64/context2.S:90: Error: missing ')' /home//Github/runtime/src/coreclr/src/pal/src/arch/amd64/context2.S:188: Error: unbalanced parenthesis in operand 1. /home//Github/runtime/src/coreclr/src/pal/src/arch/amd64/context2.S:188: Error: missing ')' /home//Github/runtime/src/coreclr/src/pal/src/arch/amd64/context2.S:188: Error: missing ')'

guesshe commented 4 years ago

@am11 OK. So I solved my issue by using clang3.9 as assembler. But now I am facing another issue when trying to run a helloworld.dll application. ASSERT [EXCEPT ] at /home/rihe/Github/runtime/src/coreclr/src/pal/src/exception/signal.cpp.971: handle_signal: sigaction() call failed with error code 48 (Not supported) I would assume this has something to do with QNX's implementation of sigaction() call?

am11 commented 4 years ago

gcc 5.4.0

Maybe try compiling with CXXFLAGS=-Wa,--divide e.g. CXXFLAGS=-Wa,--divide ./src/coreclr/build-runtime.sh

At some point we achieved the support from gcc 4.9 to 9, however, CI is only testing gcc 7. Need some cycles to fix build on older GCC.

sigaction() call failed with error code 48 (Not supported)

Does it work with this patch:

diff --git a/src/coreclr/src/pal/src/exception/signal.cpp b/src/coreclr/src/pal/src/exception/signal.cpp
index d6d8256610e..5d80be4ffe6 100644
--- a/src/coreclr/src/pal/src/exception/signal.cpp
+++ b/src/coreclr/src/pal/src/exception/signal.cpp
@@ -960,9 +960,9 @@ Parameters :
 --*/
 void restore_signal(int signal_id, struct sigaction *previousAction)
 {
-    if (-1 == sigaction(signal_id, previousAction, NULL))
+    if (signal(signal_id, [](int signum) { (void)signum; /* ignored */ }) == SIG_ERR)
     {
-        ASSERT("restore_signal: sigaction() call failed with error code %d (%s)\n",
+        ASSERT("restore_signal: signal() call failed with error code %d (%s)\n",
             errno, strerror(errno));
     }
 }

(we can make it nicer with cmake introspection etc. later)