cross-rs / cross

“Zero setup” cross compilation and “cross testing” of Rust crates
Apache License 2.0
6.21k stars 354 forks source link

Cross produces binary that segfaults in certain combinations #1512

Open ctron opened 3 weeks ago

ctron commented 3 weeks ago

This originally comes from here: https://github.com/rust-lang/git2-rs/issues/1057 The reproducer is here: https://github.com/ctron/git2-repro

The summary is: if you compile some application which uses git2-rs, for aarch64-unknown-linux-gnu, it produces a binary which segfaults (or otherwise fails). Switching the build image from latest to edge, seems to fix this issue.

However, everything should be vendored anyway. I don't know what flows into the binary that makes it segfault.

Emilgardis commented 3 weeks ago

This could be due to using a old gcc/g++ and/or glibc headers, do you have a gdb backtrace/stack for the segfault?

ctron commented 3 weeks ago

This could be due to using a old gcc/g++ and/or glibc headers, do you have a gdb backtrace/stack for the segfault?

Yes, I actually have two: https://github.com/ctron/git2-repro?tab=readme-ov-file#faults (or see below).

It seems to be triggered with bigger repositories only. And then haves differently. It feels like one of those "some part of the application writes into random memory" moments that C might give you. Though most of it should be vendored anyway.

https://github.com/CVEProject/cvelistV5.git

[Current thread is 1 (Thread 0xffffb04e1f90 (LWP 268297))]
(gdb) where
#0  0x0000aaaab179252c in git_commit_list_insert_by_date ()
#1  0x0000aaaab1747d30 in git_revwalk.push_commit ()
#2  0x0000aaaab174828c in git_revwalk.push_glob ()
#3  0x0000aaaab176da10 in git_smart.negotiate_fetch ()
#4  0x0000aaaab1795c78 in git_fetch_negotiate ()
#5  0x0000aaaab173d4bc in git_remote.download ()
#6  0x0000aaaab173e75c in git_remote_fetch ()
#7  0x0000aaaab158ae04 in git2::remote::Remote::fetch ()
#8  0x0000aaaab1588dc0 in tracing::span::Span::in_scope ()
#9  0x0000aaaab15790b4 in git2_repro::run ()
#10 0x0000aaaab157f8cc in tokio::runtime::task::core::Core<T,S>::poll ()
#11 0x0000aaaab157fdbc in tokio::runtime::task::harness::Harness<T,S>::poll ()
#12 0x0000aaaab1641068 in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#13 0x0000aaaab1649b18 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#14 0x0000aaaab1a402b0 in <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at library/alloc/src/boxed.rs:2020
#15 <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once () at library/alloc/src/boxed.rs:2020
#16 std::sys::pal::unix::thread::Thread::new::thread_start () at library/std/src/sys/pal/unix/thread.rs:108
#17 0x0000ffffb0d33624 in start_thread (arg=0xaaaab1a40284 <std::sys::pal::unix::thread::Thread::new::thread_start>) at pthread_create.c:477
#18 0x0000ffffb0bcb62c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
Warning: the current language does not match this frame.

https://github.com/NixOS/nixpkgs

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `./git2-repro --path test2 --source https://github.com/NixOS/nixpkgs'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0xffff9238af90 (LWP 267827))]
(gdb) where
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000ffff929c3aac in __GI_abort () at abort.c:79
#2  0x0000ffff92a10f40 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0xffff92ad26a8 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x0000ffff92a18344 in malloc_printerr (str=str@entry=0xffff92ace160 "free(): invalid size") at malloc.c:5347
#4  0x0000ffff92a19b98 in _int_free (av=0xffff92b11a98 <main_arena>, p=0xffff7ab9aff0, have_lock=<optimized out>) at malloc.c:4177
#5  0x0000aaaac99aef18 in git_signature_free ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Emilgardis commented 3 weeks ago

I think then thats its a miscompilation somewhere, not sure how much we need to dig into this seeing that the edge images work, which will be promoted in the 0.3.0 release which I hope to get out this summer during my vacation.

The only thing that differs between the images is the build environment except the rust toolchain (we dont provide rust in the images)

ctron commented 3 weeks ago

Yea, given the edge stuff works, that's ok-ish. I was wondering how hard it would be to add (to the current release) an additional image. I tried to come up with a centos9 image, but found the containers rather complex and gave up. Maybe that's something to improve in the future. But it's definitely out of scope for this issue.