dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.97k stars 4.65k forks source link

[mono] Mono coredumps when built with `_FORTIFY_SOURCE=2` forced. #82269

Closed ayakael closed 1 month ago

ayakael commented 1 year ago

Description

Building runtime on current edge of Alpine Linux using mono generates a broken runtime.

Reproduction Steps

On Alpine Linux Edge, build a mono-flavored runtime, and try to execute ./dotnet build. Core dump will occur.

Expected behavior

Good execution

Actual behavior

Coredump

Regression?

Bug occurs on dotnet6 as well.

Known Workarounds

Undefining _FORTIFY_SOURCE in src/mono/mono/utils/mono-thread-coop.c

Configuration

7.0.103 and 6.0.114 Alpine Linux Edge s390x and ppc64le, presumably any runtime built with mono-flavor

Other information

Running dotnet build through gdb yields the following:

0x000003fffd70fc8e in ?? () at /home/build/aports/community/dotnet7-build/src/dotnet-v7.0.103/src/runtime/artifacts/source-build/self/src/src/mono/mono/utils/mono-threads-coop.c:123 from /home/build/aports/community/dotnet7-build/src/dotnet-bunny-12/release/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so
123             g_assert (mono_threads_is_blocking_transition_enabled ());

Runtimes built previously still work, so something occurs during build for runtime to fail with src/mono/mono/utils/mono-threads-coop.c on line 123.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-io See info in area-owners.md if you want to be subscribed.

Issue Details
### Description Building runtime on current edge of Alpine Linux using mono generates a broken runtime. ### Reproduction Steps On Alpine Linux Edge, build a mono-flavored runtime, and try to execute `./dotnet build`. Core dump will occur. ### Expected behavior Good execution ### Actual behavior Coredump ### Regression? Bug occurs on dotnet6 as well. ### Known Workarounds Building on Alpine Linux v3.17 ### Configuration 7.0.103 and 6.0.114 Alpine Linux Edge s390x and ppc64le, presumably any runtime built with mono-flavor ### Other information Running `dotnet build` through `gdb` yields the following: ``` 0x000003fffd70fc8e in ?? () at /home/build/aports/community/dotnet7-build/src/dotnet-v7.0.103/src/runtime/artifacts/source-build/self/src/src/mono/mono/utils/mono-threads-coop.c:123 from /home/build/aports/community/dotnet7-build/src/dotnet-bunny-12/release/shared/Microsoft.NETCore.App/7.0.3/libcoreclr.so 123 /home/build/aports/community/dotnet7-build/src/dotnet-v7.0.103/src/runtime/artifacts/source-build/self/src/src/mono/mono/utils/mono-threads-coop.c: No such file or directory. ``` This thus points to a library issue. The following diff compares the library versions of last successful build and this one: ```diff < readline(8.2.0-r0) > ncurses-terminfo-base(6.4_p20230211-r0) > ncurses-libs(6.4_p20230211-r0) > readline(8.2.001-r0) < xz-libs(5.4.0-r1) > xz-libs(5.4.1-r0) < llvm15-libs(15.0.6-r1) < clang15-libs(15.0.6-r2) < clang15(15.0.6-r2) > llvm15-libs(15.0.7-r0) > clang15-libs(15.0.7-r2) > clang15(15.0.7-r2) > lz4-libs(1.9.4-r1) < cmake(3.25.1-r0) > cmake(3.25.2-r0) < dotnet7-build(7.0.101-r0) < dotnet7-artifacts(7.0.101-r0) > dotnet7-build(7.0.102-r0) > dotnet7-artifacts(7.0.102-r0) < findutils(4.9.0-r2) > findutils(4.9.0-r3) < icu-data-full(72.1-r1) < icu-libs(72.1-r1) < icu(72.1-r1) < icu-dev(72.1-r1) > icu-data-full(72.1-r2) > icu-libs(72.1-r2) > icu(72.1-r2) > icu-dev(72.1-r2) < libblkid(2.38.1-r2) < libuuid(2.38.1-r2) < libfdisk(2.38.1-r2) < libmount(2.38.1-r2) < libsmartcols(2.38.1-r2) < util-linux-dev(2.38.1-r2) > libblkid(2.38.1-r4) > libuuid(2.38.1-r4) > libfdisk(2.38.1-r4) > libmount(2.38.1-r4) > libsmartcols(2.38.1-r4) > util-linux-dev(2.38.1-r4) < libcom_err(1.46.5-r5) < e2fsprogs-libs(1.46.5-r5) < e2fsprogs-dev(1.46.5-r5) > libcom_err(1.47.0-r0) > e2fsprogs-libs(1.47.0-r0) > e2fsprogs-dev(1.47.0-r0) < glib(2.74.4-r0) > glib(2.74.5-r0) < libldap(2.6.3-r6) > libldap(2.6.4-r0) < openssl-dev(3.0.7-r2) < nghttp2-dev(1.51.0-r0) > openssl-dev(3.0.8-r0) > nghttp2-dev(1.52.0-r0) < curl-dev(7.87.0-r2) > curl-dev(7.88.0-r1) < libgit2(1.5.0-r2) < bsd-compat-headers(0.7.2-r3) < ncurses-dev(6.4_p20230107-r0) > libgit2(1.5.1-r0) > bsd-compat-headers(0.7.2-r4) > ncurses-dev(6.4_p20230211-r0) > libedit(20221030.3.1-r0) < libgit2-dev(1.5.0-r2) > libgit2-dev(1.5.1-r0) < xz(5.4.0-r1) < xz-dev(5.4.0-r1) > xz(5.4.1-r0) > xz-dev(5.4.1-r0) < linux-headers(6.1.0-r0) > linux-headers(6.1.11-r0) < python3(3.11.1-r2) < lldb(15.0.6-r0) < lldb-dev(15.0.6-r0) < llvm15(15.0.6-r1) > python3(3.11.2-r0) > lldb(15.0.7-r0) > lldb-dev(15.0.7-r0) > llvm15(15.0.7-r0) < zstd(1.5.2-r10) < zstd-dev(1.5.2-r10) > zstd(1.5.4-r0) > zstd-dev(1.5.4-r0) < py3-setuptools(65.6.3-r0) < llvm15-test-utils(15.0.6-r1) < llvm15-dev(15.0.6-r1) < userspace-rcu(0.13.2-r0) < userspace-rcu-dev(0.13.2-r0) > py3-setuptools(67.3.2-r0) > llvm15-test-utils(15.0.7-r0) > llvm15-dev(15.0.7-r0) > userspace-rcu(0.14.0-r0) > userspace-rcu-dev(0.14.0-r0) < nodejs(18.12.1-r0) < numactl(2.0.16-r0) < numactl-dev(2.0.16-r0) > c-ares(1.19.0-r0) > nodejs(18.14.0-r0) > numactl(2.0.16-r2) > numactl-dev(2.0.16-r2) > popt(1.19-r1) > rsync(3.2.7-r0) < skalibs(2.12.0.1-r0) < utmps-libs(0.1.2.0-r1) > skalibs(2.13.0.0-r0) > utmps-libs(0.1.2.1-r0) < npm(9.2.0-r0) > npm(9.4.2-r0) < libelf(0.188-r0) > libelf(0.188-r1)< readline(8.2.0-r0) > ncurses-terminfo-base(6.4_p20230211-r0) > ncurses-libs(6.4_p20230211-r0) > readline(8.2.001-r0) < xz-libs(5.4.0-r1) > xz-libs(5.4.1-r0) < llvm15-libs(15.0.6-r1) < clang15-libs(15.0.6-r2) < clang15(15.0.6-r2) > llvm15-libs(15.0.7-r0) > clang15-libs(15.0.7-r2) > clang15(15.0.7-r2) > lz4-libs(1.9.4-r1) < cmake(3.25.1-r0) > cmake(3.25.2-r0) < dotnet7-build(7.0.101-r0) < dotnet7-artifacts(7.0.101-r0) > dotnet7-build(7.0.102-r0) > dotnet7-artifacts(7.0.102-r0) < findutils(4.9.0-r2) > findutils(4.9.0-r3) < icu-data-full(72.1-r1) < icu-libs(72.1-r1) < icu(72.1-r1) < icu-dev(72.1-r1) > icu-data-full(72.1-r2) > icu-libs(72.1-r2) > icu(72.1-r2) > icu-dev(72.1-r2) < libblkid(2.38.1-r2) < libuuid(2.38.1-r2) < libfdisk(2.38.1-r2) < libmount(2.38.1-r2) < libsmartcols(2.38.1-r2) < util-linux-dev(2.38.1-r2) > libblkid(2.38.1-r4) > libuuid(2.38.1-r4) > libfdisk(2.38.1-r4) > libmount(2.38.1-r4) > libsmartcols(2.38.1-r4) > util-linux-dev(2.38.1-r4) < libcom_err(1.46.5-r5) < e2fsprogs-libs(1.46.5-r5) < e2fsprogs-dev(1.46.5-r5) > libcom_err(1.47.0-r0) > e2fsprogs-libs(1.47.0-r0) > e2fsprogs-dev(1.47.0-r0) < glib(2.74.4-r0) > glib(2.74.5-r0) < libldap(2.6.3-r6) > libldap(2.6.4-r0) < openssl-dev(3.0.7-r2) < nghttp2-dev(1.51.0-r0) > openssl-dev(3.0.8-r0) > nghttp2-dev(1.52.0-r0) < curl-dev(7.87.0-r2) > curl-dev(7.88.0-r1) < libgit2(1.5.0-r2) < bsd-compat-headers(0.7.2-r3) < ncurses-dev(6.4_p20230107-r0) > libgit2(1.5.1-r0) > bsd-compat-headers(0.7.2-r4) > ncurses-dev(6.4_p20230211-r0) > libedit(20221030.3.1-r0) < libgit2-dev(1.5.0-r2) > libgit2-dev(1.5.1-r0) < xz(5.4.0-r1) < xz-dev(5.4.0-r1) > xz(5.4.1-r0) > xz-dev(5.4.1-r0) < linux-headers(6.1.0-r0) > linux-headers(6.1.11-r0) < python3(3.11.1-r2) < lldb(15.0.6-r0) < lldb-dev(15.0.6-r0) < llvm15(15.0.6-r1) > python3(3.11.2-r0) > lldb(15.0.7-r0) > lldb-dev(15.0.7-r0) > llvm15(15.0.7-r0) < zstd(1.5.2-r10) < zstd-dev(1.5.2-r10) > zstd(1.5.4-r0) > zstd-dev(1.5.4-r0) < py3-setuptools(65.6.3-r0) < llvm15-test-utils(15.0.6-r1) < llvm15-dev(15.0.6-r1) < userspace-rcu(0.13.2-r0) < userspace-rcu-dev(0.13.2-r0) > py3-setuptools(67.3.2-r0) > llvm15-test-utils(15.0.7-r0) > llvm15-dev(15.0.7-r0) > userspace-rcu(0.14.0-r0) > userspace-rcu-dev(0.14.0-r0) < nodejs(18.12.1-r0) < numactl(2.0.16-r0) < numactl-dev(2.0.16-r0) > c-ares(1.19.0-r0) > nodejs(18.14.0-r0) > numactl(2.0.16-r2) > numactl-dev(2.0.16-r2) > popt(1.19-r1) > rsync(3.2.7-r0) < skalibs(2.12.0.1-r0) < utmps-libs(0.1.2.0-r1) > skalibs(2.13.0.0-r0) > utmps-libs(0.1.2.1-r0) < npm(9.2.0-r0) > npm(9.4.2-r0) < libelf(0.188-r0) > libelf(0.188-r1) < util-linux-misc(2.38.1-r2) > setarch(2.38.1-r4) > util-linux-misc(2.38.1-r4) < inetutils-syslogd-openrc(2.4-r1) < util-linux-misc(2.38.1-r2) > setarch(2.38.1-r4) > util-linux-misc(2.38.1-r4) < inetutils-syslogd-openrc(2.4-r1) ``` Runtimes built previously still work, so something occurs during build for linking to fail with `src/mono/mono/utils/mono-threads-coop.c` on line 123. Any library updates that look suspicious?
Author: ayakael
Assignees: -
Labels: `area-System.IO`
Milestone: -
ayakael commented 1 year ago

Of note, both builds use the backported version of https://github.com/dotnet/runtime/pull/76500. Indeed, without this, mono flavored runtime doesn't even build on musl.

ayakael commented 1 year ago

Further investigation reveals a more telling error:

Reading symbols from ./dotnet...
(gdb) run
Starting program: /var/build/dotnet7/community/dotnet7-stage0/src/test/dotnet build

Program received signal SIGILL, Illegal instruction.
copy_stack_data_internal (stackdata_begin=0x7fffffffdb98, info=<optimized out>, wrapper_data1=<optimized out>, wrapper_data2=<optimized out>) at /var/build/dotnet7/community/dotnet7-stage0/src/dotnet-v7.0.101-source-build/src/runtime/src/mono/mono/utils/mono-threads-coop.c:193
193     memcpy (state->gc_stackdata, stackdata_end, stackdata_size);

A similar bug occured in another package I maintain, and was brough to the surface by https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/43463, which makes clang15 enable fortify-source by default. The actual bug is a target mismatch bug introduced in gcc12. It is tracked here: https://gitlab.alpinelinux.org/alpine/aports/-/issues/14105

Applying the following patch does the trick as a workaround:

From 98054ea87ce70247bb09ceafd2ad1a0b36d2fef4 Mon Sep 17 00:00:00 2001
Patch-Source: https://github.com/dotnet/runtime/issues/82269
From: Antoine Martin <dev@ayakael.net>
Date: Sat, 1 Oct 2022 09:21:58 -0400
Subject: [PATCH] Undefine fortify-source on mono-thread-coop

When _FORTIFY_SOURCE=2, there is a bug relating to memcpy that expresses itself.
See: https://gitlab.alpinelinux.org/alpine/aports/-/issues/14105. Alpine Linux
now sets this by default since https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/43463,
which makes mono-flavored runtime dump its core. This patch offers a workaround
by undefining _FORTIFY_SOURCE in the problematic file.

---
diff --git a/src/runtime/src/mono/mono/utils/mono-threads-coop.c b/src/runtime/src/mono/mono/utils/mono-threads-coop.c
index 4ed659d6605..34bb5785fba 100644
--- a/src/runtime/src/mono/mono/utils/mono-threads-coop.c
+++ b/src/runtime/src/mono/mono/utils/mono-threads-coop.c
@@ -15,6 +15,7 @@
 #ifdef TARGET_MACH
 #define _DARWIN_C_SOURCE
 #endif
+#undef _FORTIFY_SOURCE

 #include <mono/utils/mono-compiler.h>
 #include <mono/utils/mono-threads.h>
nekopsykose commented 1 year ago

copying what i wrote downstream for reference:

it's not; this is an actual fortify violation. clang is also not affected by that inlining issue, which manifests as a compilation failure, not a runtime sigill. that issue is about gcc failing to compile code at all in certain situations with fortify. this one is that the code is wrong, and crashes on the fortify assertion. breaking fortify hits e.g. a ud2 instruction (SIGILL), because the code did something incorrect. so, strictly speaking, in this case it is (probably) not a compiler bug and undefining fortify is not correct. the code itself does something wrong in copy_stack_data_internal with the memcpy. the most common cause is that it fails to uphold the requirement for memcpy: the start and end don't overlap.

from man 3 memcpy:

 DESCRIPTION
        The memcpy() function copies n bytes from memory area src to memory
        area dest.  The memory areas must not overlap.  Use memmove(3) if the
        memory areas do overlap.

if you change that memcpy to memmove on line 193, it will probably pass. and if you write some sample code where the memory overlaps, you will reproduce the same crash (with either gcc or clang utilising the fortify headers). they're working as intended :)

ayakael commented 1 year ago

Changing to memcpy to memmove still yields an error:

Reading symbols from ./dotnet...
(gdb) run
Starting program: /var/build/dotnet7/community/dotnet7-stage0/src/test/dotnet build

Program received signal SIGILL, Illegal instruction.
copy_stack_data_internal (stackdata_begin=0x7fffffffd848, info=<optimized out>, wrapper_data1=<optimized out>, wrapper_data2=<optimized out>) at /var/build/dotnet7/community/dotnet7-stage0/src/dotnet-v7.0.101-source-build/src/runtime/src/mono/mono/utils/mono-threads-coop.c:193
193             memmove (state->gc_stackdata, stackdata_end, stackdata_size);
nekopsykose commented 1 year ago

the memmove

_FORTIFY_FN(memmove) void *memmove(void * _FORTIFY_POS0 __d,
                                   const void * _FORTIFY_POS0 __s, size_t __n)
{
    size_t __bd = __builtin_object_size(__d, 0);
    size_t __bs = __builtin_object_size(__s, 0);

    if (__n > __bd || __n > __bs)
        __builtin_trap();
    return __orig_memmove(__d, __s, __n);
}

__builtin_object_size would return (size_t)-1 if it fails, so then the check would pass. meaning, the assertion is real, and the __n is probably wrong

the memcpy fortify is that, with an extra overlap check.

marek-safar commented 1 year ago

@lambdageek @vargaz what do you think?

SamMonoRT commented 1 year ago

Moving this to 9.0.0, but will consider backporting a fix for the issue

SamMonoRT commented 1 year ago

Assigning to @lambdageek for tracking, we'll tackle once back from vacation

lambdageek commented 1 year ago

Mono's behaviour is intentional: we need to copy a chunk of the stack for the benefit of the GC.

So assuming we're calculating the bounds correctly (which may include the frames of several callers and possibly a red zone - I don't recall), the work here will be to convince FORTIFY_SOURCE that we're doing it on purpose and not to flag us. (I'm not sure if that's possible, so I'd welcome help from someone who has experience with it)

lambdageek commented 1 year ago

I mean it's also possible we don't calculate the size of the destination buffer correctly. I don't mean to imply we're smarter than the compiler. Just that we're doing something intentionally sus. But maybe we messed it up. We need to investigate

steveisok commented 1 month ago

Since we will no longer be shipping desktop configurations for mono (https://github.com/dotnet/docs/issues/41366), it is not likely we will work on this issue.