Open pavelzw opened 1 month ago
It appears that the same issue is occurring with jnv
. See https://github.com/conda-forge/jnv-feedstock/pull/2. Is this an issue with the clang
toolchain? @conda-forge/clang-compiler-activation
This seems to be more likely a question of the rust setup to me, as it's happening during rust compilation.
The log
clang-16: error: unable to execute command: Segmentation fault: 11
doesn't tell me anything what failed.
In the jnv
failure there's at least one more tidbit:
= note: ld: warning: directory not found for option '-L$PREFIX/lib'
which could be explained by the fact that the jnv recipe specifies no host environment
Now pixi-pack
is also failing, see https://github.com/conda-forge/pixi-pack-feedstock/pull/6
Edit:
pixi
: https://github.com/conda-forge/pixi-feedstock/pull/58
tokenizers
: https://github.com/conda-forge/tokenizers-feedstock/pull/70
arro3-compute
: https://github.com/conda-forge/staged-recipes/pull/27209 / https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=998327&view=logs&jobId=e35d4f76-8ff2-5536-d795-df91e63eb9f7&j=e35d4f76-8ff2-5536-d795-df91e63eb9f7&t=fa7b4b17-b6ff-5c9c-8cfc-15f888c92310
bytewax
: https://github.com/conda-forge/bytewax-feedstock/pull/28
CC: @conda-forge/rust-activation
I reran the CI for https://github.com/conda-forge/jnv-feedstock/pull/2 and it seems to pass now. Not sure if https://github.com/conda-forge/rust-activation-feedstock/pull/60 had something to do with it. The release notes mention a miscompilation when comparing floats; no idea if that's applicable in the cases above.
Still running into the same on https://github.com/conda-forge/tokenizers-feedstock/pull/70 for the Python 3.8 MacOS build.
Still running into the same on conda-forge/tokenizers-feedstock#70 for the Python 3.8 MacOS build.
Interesting that this only happens with py38, that looks like it involves at least one other problem. Since py38 is about to be dropped in 2 month, I'd be fairly liberal in dropping those builds though. 🤷
With @h-vetinari's help, we got https://github.com/conda-forge/tokenizers-feedstock/pull/70 atleast resolved and merged, but I did see CI fail for MacOS on Python 3.12 atleast once, so I suspect this is likely flaky and maybe broader than just Python 3.8. It could also be caching at some layer for an older image causing it, but I'm not confident.
Closed by conda-forge/polars-feedstock#253, conda-forge/polars-feedstock#257 and conda-forge/polars-feedstock#261
Should we move this to the rust(-activation) feedstock?
Has anyone tried using conda-forge/label/rust_dev
to see if it's just a bad version on conda-forge
?
Has anyone tried using
conda-forge/label/rust_dev
to see if it's just a bad version onconda-forge
?
Yes, Polars is using rust_dev
and still experienced this issue when not patching the syn
version.
Is there any update with regard to this issue? Any workarounds?
Thus far have had success downgrading to the MSRV of the upstream, if available/documented/can be inferred from their CI, e.g.:
# conda_build_config.yaml
rust_compiler_version: # [osx]
- "1.74" # [osx]
Some feedstocks (like uv
) haven't had any issues with the latest version on main
. Without an fruitbox, or a particularly deep knowledge of rust
, this is the best I've been able to do.
This only happens with osx-64
, not if the build is run using osx-arm64
.
Sadly lldb
is quite silent:
% lldb /Users/uwe/Development/conda-forge/ruff-feedstock/miniforge3/conda-bld/ruff_1725476454375/_build_env/bin/x86_64-apple-darwin13.4.0-clang -c /cores/core.64352
(lldb) target create "/Users/uwe/Development/conda-forge/ruff-feedstock/miniforge3/conda-bld/ruff_1725476454375/_build_env/bin/x86_64-apple-darwin13.4.0-clang" --core "/cores/core.64352"
Core file '/cores/core.64352' (arm64) was loaded.
(lldb) bt
* thread #1, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x726f63344e5a5fbf)
* frame #0: 0x0000000102984c94
(lldb) bt all
* thread #1, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x726f63344e5a5fbf)
* frame #0: 0x0000000102984c94
thread #2
frame #0: 0x00007ff7ffdd3414
frame #1: 0x00007ff7ffddf164
frame #2: 0x00007ff7ffde0a80
(lldb)
I was able to poke around a bit. The culprit seems to be the -Wl,-exported_symbols_list,...
argument: if I rerun the link step without that argument, no crash.
In my particular sample build of pixi, the contents of that list file are trivial:
___rustc_proc_macro_decls_9059548e8b173a1d__
_rust_metadata_darling_macro_9059548e8b173a1d
(I was able to debug this by manually activating the build environment using build_env_setup.sh
, emulating the setup of the recipe build.sh
, and then setting $CARGO_TARGET_X86_64_APPLE_DARWIN_LINKER
to a one-off wrapper shell script.)
I'll fool around a bit more but it doesn't look like Rust is doing anything incredibly unreasonable here — I think we are uncovering a weird ~Clang~ ld
crash.
update: Even if the list file is completely empty, I get the segfault. So it looks like the problem is with the fundamental handling of that argument.
Also, if I pass the argument directly without the -Wl
, it still crashes.
I was able to attach an lldb
to the underlying ld
process and identify that the crash is happening in a function ld::passes::order::Layout::Comparer::operator()
. I don't know how to use lldb
though so that's as far as I'm able to take it (for now).
Web searching doesn't pull up any obvious reports of crashes associated with -exported_symbols_list
or this function. So my best guess now is that somewhere in the link line, one of the input files contains some kind of cursed symbol, and that the -exported_symbols_list
argument causes ld
to scan over all of the input symbols in a way that it wouldn't normally do, causing it to crash.
update: If I use /usr/bin/ld
instead of $BUILD/bin/x86_64_apple-darwin13.4.0-ld
, there's no crash. I don't understand the toolchain situation well enough to know if that's interesting or not.
update 2: OK, if I reorder the project .rlib
dependencies in various ways, the crash goes away?? E.g., if I move the libstrsim-(yadda).rlib
dependency just a few slots later in the link list, the manual ld
invocation works. Not at all clear to me what's changing.
If I use
/usr/bin/ld
instead of$BUILD/bin/x86_64_apple-darwin13.4.0-ld
, there's no crash. I don't understand the toolchain situation well enough to know if that's interesting or not.
That $BUILD/bin/x86_64_apple-darwin13.4.0-ld
is packaged here, and it's not impossible that there's a bug in this implementation. I say this because when we briefly enabled "hardened" libcxx builds that trap on certain classes of undefined behaviour (c.f. here), ld
ended up failing on osx in some cases. But even if that's the case, it's not clear to me where the error is coming from, and also the most recent version update for that feedstock is stuck.
@h-vetinari I am leaning towards agreeing that it's a bug in this build of ld
. I don't see how reordering the input files can cause things to go from "crash" to "no crash" in any non-pathological scenario.
FWIW, the ld
that crashes is indeed version ld64-711
from July 2024, according to its -v
output. My system version — which doesn't crash — is ld64-609.8
from December 2020. But I note that the newer tool reports using "Apple TAPI version 11.0.0 (tapi-1100.0.11)", while the older system tool reports "Apple TAPI version 12.0.0 (tapi-1200.0.23.5)". I have no idea what TAPI is (and Google reveals absolutely nothing) but I am quite surprised to see the linked version go backwards.
I'm about to wander over to the NumFOCUS project summit so that will be the end of my flailing around today.
But I note that the newer tool reports using "Apple TAPI version 11.0.0 (tapi-1100.0.11)", while the older system tool reports "Apple TAPI version 12.0.0 (tapi-1200.0.23.5)".
Indeed, neither do I know what TAPI does (though ld64 is the only package depending on it). I had proposed an update in https://github.com/conda-forge/tapi-feedstock/pull/10 a while ago, which seems to have come to life just now.
TAPI is a library to read .tbd
and creating .tbd
files from .dylib
files.
Full backtrace:
(lldb) target create "/Users/uwe/conda-bld/ruff_1725524133335/_build_env/bin/x86_64-apple-darwin13.4.0-clang" --core "/cores/core.28365"
Core file '/cores/core.28365' (x86_64) was loaded.
(lldb) bt all
* thread #1
* frame #0: 0x0000000100ae7fed x86_64-apple-darwin13.4.0-ld`void std::__1::__introsort<std::__1::_ClassicAlgPolicy, ld::passes::order::Layout::Comparer&, ld::Atom const**, false>(ld::Atom const**, ld::Atom const**, ld::passes::order::Layout::Comparer&, std::__1::iterator_traits<ld::Atom const**>::difference_type, bool) + 973
frame #1: 0x0000000100ae7de6 x86_64-apple-darwin13.4.0-ld`void std::__1::__introsort<std::__1::_ClassicAlgPolicy, ld::passes::order::Layout::Comparer&, ld::Atom const**, false>(ld::Atom const**, ld::Atom const**, ld::passes::order::Layout::Comparer&, std::__1::iterator_traits<ld::Atom const**>::difference_type, bool) + 454
frame #2: 0x0000000100ae7de6 x86_64-apple-darwin13.4.0-ld`void std::__1::__introsort<std::__1::_ClassicAlgPolicy, ld::passes::order::Layout::Comparer&, ld::Atom const**, false>(ld::Atom const**, ld::Atom const**, ld::passes::order::Layout::Comparer&, std::__1::iterator_traits<ld::Atom const**>::difference_type, bool) + 454
frame #3: 0x0000000100ae6cc5 x86_64-apple-darwin13.4.0-ld`ld::passes::order::Layout::doPass() + 149
frame #4: 0x0000000100ae6dca x86_64-apple-darwin13.4.0-ld`ld::passes::order::doPass(Options const&, ld::Internal&) + 170
frame #5: 0x0000000100886542 x86_64-apple-darwin13.4.0-ld`main + 1026
frame #6: 0x00007ff800a3f345 dyld`start + 1909
Hopefully https://github.com/conda-forge/cctools-and-ld64-feedstock/pull/74 changes something 🤞
I have built the linked PR locally and rerun the ruff build, but I still get the same stacktrace:
% lldb $(which x86_64-apple-darwin13.4.0-clang) -c /cores/core.53820
(lldb) target create "/Users/uwe/conda-bld/ruff_1725535464874/_build_env/bin/x86_64-apple-darwin13.4.0-clang" --core "/cores/core.53820"
Core file '/cores/core.53820' (x86_64) was loaded.
(lldb) bt all
* thread #1
* frame #0: 0x000000010d0dde1d x86_64-apple-darwin13.4.0-ld`void std::__1::__introsort<std::__1::_ClassicAlgPolicy, ld::passes::order::Layout::Comparer&, ld::Atom const**, false>(ld::Atom const**, ld::Atom const**, ld::passes::order::Layout::Comparer&, std::__1::iterator_traits<ld::Atom const**>::difference_type, bool) + 973
frame #1: 0x000000010d0ddc16 x86_64-apple-darwin13.4.0-ld`void std::__1::__introsort<std::__1::_ClassicAlgPolicy, ld::passes::order::Layout::Comparer&, ld::Atom const**, false>(ld::Atom const**, ld::Atom const**, ld::passes::order::Layout::Comparer&, std::__1::iterator_traits<ld::Atom const**>::difference_type, bool) + 454
frame #2: 0x000000010d0ddc16 x86_64-apple-darwin13.4.0-ld`void std::__1::__introsort<std::__1::_ClassicAlgPolicy, ld::passes::order::Layout::Comparer&, ld::Atom const**, false>(ld::Atom const**, ld::Atom const**, ld::passes::order::Layout::Comparer&, std::__1::iterator_traits<ld::Atom const**>::difference_type, bool) + 454
frame #3: 0x000000010d0dcaf5 x86_64-apple-darwin13.4.0-ld`ld::passes::order::Layout::doPass() + 149
frame #4: 0x000000010d0dcbfa x86_64-apple-darwin13.4.0-ld`ld::passes::order::doPass(Options const&, ld::Internal&) + 170
frame #5: 0x000000010ce7c952 x86_64-apple-darwin13.4.0-ld`main + 1026
frame #6: 0x00007ff800a3f345 dyld`start + 1909
Has anyone tried if an older ld64 version still works?! It's conceivable that this is a bug in v907 specifically.
@h-vetinari FYI this bug is still hitting conda-forge/deno-feedstock#132, even with the new cctools_osx-64
from the PR you linked (I have checked in the build logs to confirm the most recent cctools
w/ build number 4 is being used). I also tried bumping the syn
version, and that did not fix the segfault either.
In a local test build of pixi, https://github.com/conda-forge/cctools-and-ld64-feedstock/pull/70 fixes the failure for me.
update: in a local test of https://github.com/conda-forge/deno-feedstock/pull/132, the macOS build doesn't succeed, but it gets past the segfault issue in the current version of the PR.
https://github.com/conda-forge/cctools-and-ld64-feedstock/pull/70 has been merged and has propagated. I've retriggered a few sample builds of the Rust projects that were failing and I think they're working now, so hopefully this issue is fixed.
from https://github.com/conda-forge/polars-feedstock/pull/253#issuecomment-2259797774