dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.23k stars 4.72k forks source link

Publish NativeAOT on Ubuntu doesn't work with -r linux-musl-x64 #92294

Open eerhardt opened 1 year ago

eerhardt commented 1 year ago

Description

If someone tries to publish for NativeAOT from Ubuntu to Alpine, the publish "works", but the app doesn't run.

Reproduction Steps

On an Ubuntu machine:

  1. dotnet new webapiaot
  2. dotnet publish -t:PublishContainer -r linux-musl-x64 -p:ContainerBaseImage=alpine:latest

try to run your new alpine container, and it fails to run:

exec /app/myapp: no such file or directory

Expected behavior

I would expect either:

  1. The app should work -or-
  2. I get a publish error that this isn't supported

Actual behavior

I don't get an error. And the app doesn't work.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

cc @baronfel @agocke @MichalStrehovsky @richlander

ghost commented 1 year ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Issue Details
### Description If someone tries to publish for NativeAOT from Ubuntu to Alpine, the publish "works", but the app doesn't run. ### Reproduction Steps On an Ubuntu machine: 1. `dotnet new webapiaot` 2. `dotnet publish -t:PublishContainer -r linux-musl-x64 -p:ContainerBaseImage=alpine:latest` try to run your new alpine container, and it fails to run: ``` exec /app/myapp: no such file or directory ``` ### Expected behavior I would expect either: 1. The app should work -or- 2. I get a publish error that this isn't supported ### Actual behavior I don't get an error. And the app doesn't work. ### Regression? _No response_ ### Known Workarounds _No response_ ### Configuration _No response_ ### Other information cc @baronfel @agocke @MichalStrehovsky @richlander
Author: eerhardt
Assignees: -
Labels: `untriaged`, `area-NativeAOT-coreclr`
Milestone: -
agocke commented 1 year ago

Is this unique to PublishContainer? Or does it also repro separately?

MichalStrehovsky commented 1 year ago

Is the problem that the native toolchain we picked up is not the cross toolchain? I.e. Is this is a crosscompilation, but it's misconfigured so the native linker and native libraries we use are from the host system? What's the output if you add <ItemGroup><LinkerArg Include="-v" /></ItemGroup>?

If misconfigured cross toolchain is the problem, this is similar to #88942. The issue is that we don't control the native build tools. The user needs to configure them. It is very hard for us to check that the environment is properly configured. This is Linux after all, there's a bazillion ways how a valid native toolchain could look like. We could smoke test the output of the linker and if the binary depends on glibc, it's probably not musl, but that's a bit of an overreach.

We'll not be able to solve this unless we start shipping our own native toolchain and sysroot instead of relying on user to configure this.

eerhardt commented 1 year ago

Is this unique to PublishContainer? Or does it also repro separately?

I don't believe it is unique to PublishContainer. To try it, you can just dotnet publish -r linux-musl-x64 on your Ubuntu machine, put the executable into an alpine container and try to run it.

MichalPetryka commented 1 year ago

We'll not be able to solve this unless we start shipping our own native toolchain and sysroot instead of relying on user to configure this.

Are there any plans to make that the default now?

MichalStrehovsky commented 1 year ago

We'll not be able to solve this unless we start shipping our own native toolchain and sysroot instead of relying on user to configure this.

Are there any plans to make that the default now?

We don't even have plans to "make that", so no plans to make that the default either :).

MichalStrehovsky commented 1 year ago

I had a look at this. I don't know if there's anything we can do here.

I thought that maybe if clang is used as the linker, we could force passing --target=x86_64-alpine-linux-musl to clang and clang would error out if this is not a matching sysroot. But we already do pass --target=aarch64-alpine-linux-musl when doing arm64 crosscompile and that one fails with:

$ dotnet publish -r linux-musl-arm64
  /usr/bin/ld.bfd: unrecognised emulation mode: aarch64linux
  Supported emulations: elf_x86_64 elf32_x86_64 elf_i386 elf_iamcu elf_l1om elf_k1om i386pep i386pe
clang : error : linker command failed with exit code 1 (use -v to see invocation) [/home/michals/net8test/net8test.csproj]

I.e. clang didn't do anything to check if we're actually configured for arm64 musl and just passed the host system's libs to the linker along with our arm64 object file. This failed to link for obvious reason. This means that even clang doesn't care/cannot validate we can actually target the thing we try to target in the current configuration.

The only way I see we could make this fail to link is if we reference some symbol that only exists in musl libc so that an attempt to link against glibc fails. The hello world produced with -r linux-musl-x64 actually works fine on Ubuntu because we successfully linked it with glibc from the host system.

giggio commented 1 year ago

If the problem is that the native toolchain needs to be correctly configured this should be somewhere in the docs and the cli should handle it if the toolchain is misconfigured. The user is not expected to know that using -r linux-musl-x64 will not work.

That said, have you taken a look at what the developers did with Cross? It is a docker based approach and it just works.

I'm actually going to try to now build with an sdk Alpine container image to see if it works... If that works, getting it into a tool shouldn't be that hard. It would also solve other platforms, like arm64 etc.

giggio commented 1 year ago

Ok, I got it work on Alpine. This is a simple Bash script that will compile with the help of containers. This will build native AOT binaries in the apps directory (notice -v on the app directory docker run), and add them to a dotnet/runtime-deps:8.0-alpine based image using dotnet's container tooling. I'm removing symbols to get the image smaller, so -p:DebugType=none is optional. This is just a quick script, it could be improved by prebuilding a base image with the apk packages. Still, it takes less than 10 seconds to add them, so that's not really a big problem.

#!/bin/bash

set -euo pipefail
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

CONTAINER_ID=$(docker run -d -v /var/run/docker.sock:/var/run/docker.sock -v "$DIR":/app mcr.microsoft.com/dotnet/sdk:8.0-alpine sh -c "mkdir /home/user && chown `id -u`:`id -g` /home/user && apk add clang build-base zlib-dev docker-cli && touch /.done && sleep infinity")
while ! docker exec -ti $CONTAINER_ID ls /.done > /dev/null; do docker logs $CONTAINER_ID; sleep 2; done
docker exec -ti --workdir /app --user `id -u`:`id -g` -e HOME=/home/user $CONTAINER_ID sh -c 'dotnet publish -r linux-musl-x64 -t:PublishContainer -p:ContainerBaseImage=mcr.microsoft.com/dotnet/runtime-deps:8.0-alpine -p:ContainerImageTag=latest -p:PublishAot=true -p:DebugType=none -c Release'
docker rm -f $CONTAINER_ID

As I mentioned before, this could easily be built into a tool. Doing a dotnet-cross build -r linux-musl-x64 would then do the right thing.

giggio commented 1 year ago

Ok, I built dotnet-cross, install instructions:

https://github.com/giggio/dotnet-cross#installing-dotnet-cli-tool

TLDR:

dotnet tool install --global dotnet-cross

It demands docker to be able to build.

@eerhardt, just install it and run:

dotnet cross publish -t:PublishContainer -r linux-musl-x64 -p:ContainerBaseImage=alpine:latest

The only thing that changed was the cross, added to the line.

The build of the tool is reproducible and done by Github actions, you can verify everything. The .nupkg will become smaller when Docker.DotNet drops JSON .NET.