dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.04k stars 4.68k forks source link

Can't build 2.0.0 for armel (debian rootfs) #2892

Closed alucryd closed 4 years ago

alucryd commented 7 years ago

Steps to reproduce

Build v2.0.0 from the official tarball after generating a debian 8 rootfs.

./build.sh -release -verbose -CrossBuild=true -DistroRid=debian.8-armel -TargetArchitecture=armel -StripSymbols=true -SkipTests=true -DisableCrossgen=true

Expected behavior

Build should proceed until the end.

Actual behavior

Build fails, build log: https://paste.xinu.at/2FE/

Environment data

Followed official procedures on Ubuntu Server 16.04, building with llvm and clang 3.9. Note that I added -static-libgcc and -static-libstdc++ linker flags to cross/armel/toolchain.cmake because the device I'm targeting sports an older libstdc++.

LukePulverenti commented 7 years ago

This is currently blocking our ability to release software built with .NET Core on many armel-based NAS devices.

steveharter commented 7 years ago

@alucryd the link provided no longer works.

Are you trying to build the dotnet/core repro?

alucryd commented 7 years ago

@steveharter I managed to build coreclr, corefx and corehost separately with the debian rootfs after applying the ubuntu trusty patch for armhf, it's also needed for debian armel. I replaced all files from an official 2.0.0 x64 runtime with those I built (except createdump which I didn't find in any of thoe 3 parts, so I removed it from the JSON file).

Also modified CMakeLists for all 3 parts to add -static-libgcc and -static-libstdc++ linker flags. It works as intended for my custom x64 runtime (no more dep on libstdc++ or libgcc), but not as much for armel. There is no more dependency on libstdc++ but everything's still linked against libgcc_s somehow. Will try -fno-builtin later as per a suggestion on stack.

The custom x64 runtime works like a breeze on a wide range of systems I tried, however I can't run anything with the armel runtime, trying to launch a dll fails almost immediately with the following: Failed to initialize CoreCLR, HRESULT: 0x80004005

Here is the full strace output if it helps: https://paste.xinu.at/OoSk/

steveharter commented 7 years ago

I see. Can you export COREHOST_TRACE=1 in your environment to see if the trace output for the host @ libhostfxr.so seems correct before it loads and hands off to coreclr? If that looks correct, I'll create a new issue in the core repo instead of this repo (core-setup). Thanks

alucryd commented 7 years ago

Apologies for the delay, had other things on my hands but I finally took the time to get back to this.

Here is the full log with trace enabled: https://paste.xinu.at/H7Lt6e/

steveharter commented 7 years ago

Nothing stands out in the host trace output; it called into the coreclr to initialize which returned the 0x80004005

The strace output showed the last module loaded was liburcu-common.so.2; I'm not sure that was the last static module to be loaded.

@alucryd have you tried debugging under gdb or other? That's what I would do if I had a local repro.

@janvorli any thoughts on this custom build?

janvorli commented 7 years ago

@alucryd as for

I replaced all files from an official 2.0.0 x64 runtime with those I built

Have you built and replaced the managed assemblies as well? If not, then this would not work, at least not for System.Private.CoreLib.dll. Also, the build kind of System.Private.CoreLib.dll (release / debug) must match that of the libcoreclr.so.

janvorli commented 7 years ago

I mean, using System.Private.CoreLib.dll built for 64 bit OS cannot work on 32 bit one since it has structures matching their counterparts in the native code of libcoreclr.so and their layout differs for 32 and 64 bit and also for debug and release build.

alucryd commented 7 years ago

@janvorli Yes I did replace System.Private.CoreLib.dll with the one located in "bin/Product/Linux.armel.Release", every build was of the release type.

Here are the commands I used:

The managed corefx was built for x64 as per the cross-build instructions.

LukePulverenti commented 6 years ago

Just for additional info, we're observing the same situation with i386.

janvorli commented 6 years ago

One more question. When building the sources, have you built them from the same commit that the ones in the runtime tarball were built from? If not, it could also lead to the initialization issue.

alucryd commented 6 years ago

Yes, I used all 3 official 2.0.0 tarballs. Used the same tarballs to build an x64 runtime and it works fine. Only cross-building seems to introduce issues.

LukePulverenti commented 6 years ago

Hi guys, any other thoughts or things we can investigate, or do you think we're just blocked on this right now? We're trying to roll out .NET Core-based NAS packages for our media server app on lots of different NAS platforms, such as QNAP, Synology, Western Digital, etc.

As alucryd has mentioned, x64 has been smooth but armel is a bit of a problem right now. Thanks for any help you can offer.

janvorli commented 6 years ago

@alucryd I can try to repro it locally on my Raspberry Pi 3. Could you please put down step by step list of what do did so that I can repeat the same process?

alucryd commented 6 years ago

Apologies for the delay.

This process worked fine for a custom x64 build, did exactly the same for armel.

LukePulverenti commented 6 years ago

By the way, our application doesn't even start, so this should be reproducible with a hello world console app. Thanks.

janvorli commented 6 years ago

I'll try to repro that locally in the next couple of days.

LukePulverenti commented 6 years ago

Did you find anything? Thanks !

janvorli commented 6 years ago

@LukePulverenti I was in Redmond the last two weeks without access to my RPI3, but now I'm back. But one more question before I dive into that. When you were copying files from the coreclr cross build, have you replaced just System.Private.CoreLib.dll or also the System.Private.CoreLib.ni.dll? The System.Private.CoreLib.ni.dll in the original SDK contains x64 native code and coreclr tries to load it if present.

alucryd commented 6 years ago

I replaced every file that was generated by cross-building coreclr and corefx, I believe that one was so I should have replaced it too.

LukePulverenti commented 6 years ago

@janvorli Is this enough information to reproduce the problem. Thanks !

LukePulverenti commented 6 years ago

Any news on this? Thanks.

janvorli commented 6 years ago

@LukePulverenti I am sorry for the huge delay. I was on a paternity leave for the last month and I was too busy before that. I have actually started to look into it today.

LukePulverenti commented 6 years ago

Excellent, thank you.

janvorli commented 6 years ago

So, I've tried and made it work. I didn't try to link the standard libs statically, but that should not change anything. There are few things though that you likely haven't done / didn't know. Here are all the steps:

  1. Fetch the latest dotnet sdk using this link: https://www.microsoft.com/net/download/thank-you/dotnet-sdk-2.1.4-linux-x64-binaries
  2. Untar it to a folder and cd into that folder
  3. Run strings shared/Microsoft.NETCore.App/2.0.5/libcoreclr.so | grep "@(#)" | grep -o "[a-f0-9]\{40\}". That gives you the commit number for coreclr repo.
  4. Run strings shared/Microsoft.NETCore.App/2.0.5/System.Net.Http.Native.so | grep "@(#)" | grep -o "[a-f0-9]\{40\}". That gives you the commit number for corefx repo.
  5. Run strings dotnet | grep -o "[a-f0-9]\{40\}". That gives you the commit number for core-setup repo.
  6. Clone the coreclr, corefx, core-setup repos. In each of them, checkout the commits you've discovered in the previous steps.
  7. In each of the repos, run sudo cross/build-rootfs.sh arm
  8. In the coreclr repo, run ./build.sh cross arm release
  9. In the corefx repo, run src/Native/build-native.sh arm release cross -portable
  10. In the core-setup repo, run src/corehost/build.sh arm release cross
  11. Now you have all the artefacts you need and we can use them to replace files in the untarred sdk
  12. Go to coreclr repo root folder and copy bin/Product/Linux.arm.Release/*.so and bin/Product/Linux.arm.Release/System.Private.CoreLib.dll to the shared/Microsoft.NETCore.App/2.0.5/ folder in the sdk
  13. Go to corefx repo root folder and copy bin/Linux.arm.Release/native/*.so to the shared/Microsoft.NETCore.App/2.0.5/ folder in the sdk
  14. Go to the core-setup repo root folder. a. Copy cli/exe/dotnet/dotnet to the root folder in the sdk b. Copy cli/dll/libhostpolicy.so to the shared/Microsoft.NETCore.App/2.0.5/ folder in the sdk c. Copy cli/dll/libhostpolicy.so to the sdk/2.1.4/ folder in the sdk d. Copy cli/fxr/libhostfxr.so to the host/fxr/2.0.5/ folder in the sdk e. Copy cli/fxr/libhostfxr.so to the sdk/2.1.4/ folder in the sdk

Now to verify that the sdk works, create a folder somewhere, cd into it and then run: COMPlus_ReadyToRun=0 COMPlus_ZapDisable=1 /path/to/your/sdk/folder/dotnet new console That should work. Then you can try to run the generated hello world: COMPlus_ReadyToRun=0 COMPlus_ZapDisable=1 /path/to/your/sdk/folder/dotnet run That should print "Hello world" to the console.

Now you may be wondering why the COMPlus_ReadyToRun=0 COMPlus_ZapDisable=1 was necessary. The reason is that the SDK you've downloaded was for x64 and some of the managed .dll files are in fact crossgen-ed (pre-jitted) for x64. Without the two env variables, dotnet fails to load them and you get the following error:

Unhandled Exception: System.BadImageFormatException: An attempt was made to load a program with an incorrect format.
 (Exception from HRESULT: 0x8007000B)
Aborted

You can get rid of the need for these env vars by re-crossgening the managed assemblies using the crossgen tool that's in the shared/Microsoft.NETCore.App/2.0.5/ folder in the sdk

alucryd commented 6 years ago

Great, thank you for looking into this and for the detailed instructions, I will give them a try asap!

alucryd commented 6 years ago

I gave the instructions a try, we're interested in armel rather than arm though. The official arm runtime already works fine for us, so I replaced arm with armel everywhere. BTW, the script you're pointing to in core-setup doesn't exist, instead I used:

./build.sh -TargetArchitecture=armel -ConfigurationGroup=Release -PortableBuild=true -CrossBuild=true

Unfortunately the end result is the same, spent all day trying various build flags and library versions to no avail, I'm always getting:

Failed to initialize CoreCLR, HRESULT: 0x80004005

Note that I stripped all elf files, but it shouldn't matter.

janvorli commented 6 years ago

I am sorry, the script in core-host is src/corehost/build.sh, I've updated my comment. Let me try armel on Monday and see what I get.

janvorli commented 6 years ago

@alucryd I am unable to build corefx for armel. There are some failures due to missing openssl symbols. Before I start digging into that, I was wondering if you could share with me how you fixed that locally.

alucryd commented 6 years ago

@janvorli: I did not get that particular issue. Is it the SSLv3 missing symbols issue? I get that trying to build on arch linux since we've disabled those, but the openssl package in the debian jessie rootfs for armel should still have them.

LukePulverenti commented 6 years ago

Any news on this? Thanks !

janvorli commented 6 years ago

@alucryd the missing openssl symbols were likely due to some stale rootfs, however I am still unable to cross build corefx for armel. I've tried to build it on Ubuntu 16.04 and Ubuntu 14.04 and I keep getting the same error. I've tried to delete the rootfs and bin folders, rebuild the rootfs and then rebuild the binaries and still got the error:

[ 68%] Building CXX object System.Security.Cryptography.Native/CMakeFiles/objlib.dir/openssl.cpp.o
In file included from /home/janvorli/git/corefx/src/Native/Unix/System.Security.Cryptography.Native/openssl.cpp:17:
In file included from /home/janvorli/git/corefx/src/Native/../../cross/rootfs/armel/usr/lib/gcc/arm-linux-gnueabi/6.3.0/../../../../include/c++/6.3.0/memory:79:
In file included from /home/janvorli/git/corefx/src/Native/../../cross/rootfs/armel/usr/lib/gcc/arm-linux-gnueabi/6.3.0/../../../../include/c++/6.3.0/functional:55:
In file included from /home/janvorli/git/corefx/src/Native/../../cross/rootfs/armel/usr/lib/gcc/arm-linux-gnueabi/6.3.0/../../../../include/c++/6.3.0/tuple:39:
In file included from /home/janvorli/git/corefx/src/Native/../../cross/rootfs/armel/usr/lib/gcc/arm-linux-gnueabi/6.3.0/../../../../include/c++/6.3.0/array:39:
In file included from /home/janvorli/git/corefx/src/Native/../../cross/rootfs/armel/usr/lib/gcc/arm-linux-gnueabi/6.3.0/../../../../include/c++/6.3.0/stdexcept:39:
In file included from /home/janvorli/git/corefx/src/Native/../../cross/rootfs/armel/usr/lib/gcc/arm-linux-gnueabi/6.3.0/../../../../include/c++/6.3.0/string:52:
In file included from /home/janvorli/git/corefx/src/Native/../../cross/rootfs/armel/usr/lib/gcc/arm-linux-gnueabi/6.3.0/../../../../include/c++/6.3.0/bits/basic_string.h:5417:
In file included from /home/janvorli/git/corefx/src/Native/../../cross/rootfs/armel/usr/lib/gcc/arm-linux-gnueabi/6.3.0/../../../../include/c++/6.3.0/ext/string_conversions.h:41:
/home/janvorli/git/corefx/src/Native/../../cross/rootfs/armel/usr/lib/gcc/arm-linux-gnueabi/6.3.0/../../../../include/c++/6.3.0/cstdlib:75:15: fatal error:
      'stdlib.h' file not found
#include_next <stdlib.h>
              ^
1 error generated.

The stdlib.h exists in the rootfs and is located in the same folder as the cstdlib that fails to include it.

Here are the steps:

  1. Checkout the commit 3b42d01009ea91b988f2a03475892b99fdc2503e in corefx
  2. Create rootfs using sudo ./cross/build-rootfs.sh armel
  3. Build the native binaries using src/Native/build-native.sh cross armel release

Do these differ from what you have used?

dwmct commented 6 years ago

Following - @alucryd @janvorli any updates/progress on this?

donaldsmith2060 commented 6 years ago

Any updates?

LukePulverenti commented 6 years ago

We should probably try this all again on 2.1.

wtgodbe commented 5 years ago

@LukePulverenti have you still been seeing this issue? Or perhaps tried the same with 2.1? 2.0.0 is now End-of-life, so if the issue is only present on that release then we can consider this resolved.

LukePulverenti commented 5 years ago

We'd have to try it again with 2.2. There's a never-ending amount of devices and platforms that users want our software to run on. We more or less put this on hold and moved onto others.

dagood commented 4 years ago

Closing, it looks like this discussion is tracked at https://github.com/dotnet/runtime/issues/831 now.