NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.64k stars 13.8k forks source link

cctools ld segfaults when linking haskellPackages.Agda (2.6.2 only?) #149692

Closed sternenseemann closed 2 months ago

sternenseemann commented 2 years ago

Example Log:

Building executable 'agda' for Agda-2.6.2..
[1 of 1] Compiling Main             ( src/main/Main.hs, dist/build/agda/agda-tmp/Main.o )
Linking dist/build/agda/agda ...
/nix/store/bp55vzlmcyqcx9n08pkzslvks10zybwc-clang-wrapper-11.1.0/bin/ld: line 256: 79105 Segmentation fault: 11  /nix/store/3fl5z9yfnz08kjb4cnhw2w8a2zsm5qim-cctools-binutils-darwin-949.0.1/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
clang-11: error: linker command failed with exit code 139 (use -v to see invocation)
`cc' failed in phase `Linker'. (Exit code: 139)
builder for '/nix/store/fyzdyk7a79jqfh4r2gvjc65d9f7w22wd-Agda-2.6.2.drv' failed with exit code 1

Steps To Reproduce

nix-build -A haskellPackages.Agda on aarch64-darwin

cc @NixOS/darwin-maintainers

prusnak commented 2 years ago

Does adding autoSignDarwinBinariesHook to nativeBuildInputs help?

prusnak commented 2 years ago

Does adding autoSignDarwinBinariesHook to nativeBuildInputs help?

Ah, sorry. This is about fixing the "Killed: 9" error, not Segmentation fault: 11 error.

sternenseemann commented 2 years ago

Yeah, Agda is a normal haskell package which work for the most part.

veprbl commented 2 years ago

Would help to post a crash dump from Other -> Console -> User Reports

sternenseemann commented 2 years ago

Indeed, would be nice if someone could have a look at this locally, I can only check on Hydra logs, really.

olebedev commented 2 years ago

Getting:

Linking dist/build/agda/agda ...
/nix/store/bp55vzlmcyqcx9n08pkzslvks10zybwc-clang-wrapper-11.1.0/bin/ld: line 256: 36953 Segmentation fault: 11  /nix/store/3fl5z9yfnz08kjb4cnhw2w8a2zsm5qim-cctools-binutils-darwin-949.0.1/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
clang-11: error: linker command failed with exit code 139 (use -v to see invocation)
`cc' failed in phase `Linker'. (Exit code: 139)
builder for '/nix/store/fyzdyk7a79jqfh4r2gvjc65d9f7w22wd-Agda-2.6.2.drv' failed with exit code 1
error: build of '/nix/store/fyzdyk7a79jqfh4r2gvjc65d9f7w22wd-Agda-2.6.2.drv' failed

on 7370b263d797848f9aa391fc94999eece396f35d

turion commented 2 years ago

Is this maybe fixed since 2.6.2.1 (https://hydra.nixos.org/build/161184182)?

sternenseemann commented 2 years ago

Interesting, let's keep this open in case it happens in another package

frontsideair commented 2 years ago

I have the exact same error with haskellPackages.lattices, as you can see from Hydra job.

sternenseemann commented 2 years ago

Agda 2.6.2.1 started failing again as well iirc. I suspect this could possibly be OOM or a segfault, but impossible to test from my position.

turion commented 2 years ago

Can someone on darwin reproduce and bisect?

ZachFontenot commented 2 years ago

I'm hitting this locally on aarch64-darwin not sure how I can help out though?

siraben commented 2 years ago

I can't find the commit where this succeeds on aarch64-darwin, has anyone found it?

sternenseemann commented 2 years ago

2c85c77a8a4c2580e7b116f0c62e4fa233b7238d according to hydra

ZachFontenot commented 2 years ago
bff49dab05df8c825b268ca863140e6a119626b1 is the first bad commit
commit bff49dab05df8c825b268ca863140e6a119626b1
Merge: 40d43349e66 a7dda03a2e0
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Date:   Tue Nov 23 06:01:42 2021 +0000

Result of Bisect, which sadly, that's a huge commit

siraben commented 2 years ago

My bisection agrees:

bff49dab05df8c825b268ca863140e6a119626b1 is the first bad commit                                                                                                                                                      
commit bff49dab05df8c825b268ca863140e6a119626b1                                                                                                                                                                       
Merge: 40d43349e66 a7dda03a2e0                                                                                                                                                                                        
Author: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>                                                                                                                                   
Date:   Tue Nov 23 06:01:42 2021 +0000                                                                                                                                                                                

    Merge staging-next into staging                                                                                                                                                                                   
unbel13ver commented 2 years ago

Hello. I faced the same issue while building coreutils package for aarch64-linux (natively). At configure step I see in config.log:

configure:5968: checking whether the C compiler works
configure:5990: gcc    conftest.c  >&5
/nix/store/h190xsp8qqcv4gqig698nvvpllkiabhi-bootstrap-stage4-gcc-wrapper-9.3.0/bin/ld: line 256:  1438 Segmentation fault      /nix/store/5p5gi835bb9p1fiw8dxbfs7dzkk3jd5m-binutils-2.38/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
collect2: error: ld returned 139 exit status

My nixpkgs are

commit 1919e181fb00d7620b586206efe938bd3fd1e67c (HEAD -> master, origin/master, origin/HEAD)
Merge: cbebdfc3da9 17be6f75ce6
Author: Thiago Kenji Okada <thiagokokada@gmail.com>
Date:   Thu May 12 10:01:16 2022 +0100
veprbl commented 2 years ago

@unbel13ver That can't be related. Please file a separate issue.

soulomoon commented 2 years ago

I don't know if it is the same issue

    "nixpkgs": {
      "locked": {
        "lastModified": 1645433236,
        "narHash": "sha256-4va4MvJ076XyPp5h8sm5eMQvCrJ6yZAbBmyw95dGyw4=",
        "owner": "nixos",
        "repo": "nixpkgs",
        "rev": "7f9b6e2babf232412682c09e57ed666d8f84ac2d",
        "type": "github"
      },
building '/nix/store/8zr9cl44sdnr6mg70g3s3xkvl4dkdi0c-Agda-2.6.2.1.drv'...
error: builder for '/nix/store/8zr9cl44sdnr6mg70g3s3xkvl4dkdi0c-Agda-2.6.2.1.drv' failed with exit code 1;
       last 10 log lines:
       > [398 of 400] Compiling Agda.Compiler.JS.Compiler ( src/full/Agda/Compiler/JS/Compiler.hs, dist/build/Agda/Compiler/JS/Compiler.o, dist/build/Agda/Compiler/JS/Compiler.dyn_o )
       > [399 of 400] Compiling Agda.Compiler.Builtin ( src/full/Agda/Compiler/Builtin.hs, dist/build/Agda/Compiler/Builtin.o, dist/build/Agda/Compiler/Builtin.dyn_o )
       > [400 of 400] Compiling Agda.Main        ( src/full/Agda/Main.hs, dist/build/Agda/Main.o, dist/build/Agda/Main.dyn_o )
       > Preprocessing executable 'agda' for Agda-2.6.2.1..
       > Building executable 'agda' for Agda-2.6.2.1..
       > [1 of 1] Compiling Main             ( src/main/Main.hs, dist/build/agda/agda-tmp/Main.o )
       > Linking dist/build/agda/agda ...
       > /nix/store/km02igh4pshp20d0wn89rf5jjfxcm8v5-clang-wrapper-11.1.0/bin/ld: line 256: 44923 Segmentation fault: 11  /nix/store/diz9chv9r9m3pv9rzi7g3p2iq8vgsmr3-cctools-binutils-darwin-949.0.1/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
       > clang-11: error: linker command failed with exit code 139 (use -v to see invocation)
       > `cc' failed in phase `Linker'. (Exit code: 139)
       For full logs, run 'nix log /nix/store/8zr9cl44sdnr6mg70g3s3xkvl4dkdi0c-Agda-2.6.2.1.drv'.
Jake-Gillberg commented 2 years ago

For anyone on M1 experiencing this issue, overriding with the x86_64-darwin package works with Rosetta.

CrepeGoat commented 2 years ago

For anyone on M1 experiencing this issue, overriding with the x86_64-darwin package works with Rosetta.

A+ suggestion! In case others (like me 😅) want to try this but aren't familiar with how, here's a blog that talks about configuring nix to build x86_64 programs: https://evanrelf.com/building-x86-64-packages-with-nix-on-apple-silicon

denizdogan commented 2 years ago

For what it's worth, this is happening to me too, with Ormolu:

[11 of 11] Compiling Main             ( tests/Spec.hs, dist/build/tests/tests-tmp/Main.o, dist/build/tests/tests-tmp/Main.dyn_o )
Linking dist/build/tests/tests ...
/nix/store/48py6zrawzim9ghrnkqwm36jl4j1l23x-clang-wrapper-11.1.0/bin/ld: line 256: 70672 Segmentation fault: 11  /nix/store/5wvlj00dr22ivh210b18ccv1i60h6c1q-cctools-binutils-darwin-949.0.1/bin/ld ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"}
clang-11: error: linker command failed with exit code 139 (use -v to see invocation)
`cc' failed in phase `Linker'. (Exit code: 139)

It's nice that there's the workaround of compiling for x86_64, but is there really no other way?

veprbl commented 2 years ago

@denizdogan Please post a stacktrace. You can find it using Other -> Console -> Crash Reports.

denizdogan commented 2 years ago

ld-2022-09-27-003114.txt

@veprbl

turion commented 1 year ago

2.6.2.2 failed as well on aarch64-darwin: https://logs.nix.ci/?key=nixos/nixpkgs.210418&attempt_id=fa802878-e6af-4d7a-b890-16bc01482839

wildsebastian commented 1 year ago

I figured out that it works when I remove -foptimise-heavily from https://github.com/NixOS/nixpkgs/blob/518f09f2d0e8829c3ef77c0f535df309f49ed6d9/pkgs/development/haskell-modules/configuration-nix.nix#L943 I do not know enough to explain why this is the case.

sternenseemann commented 1 year ago

My theory would be that the GHC flags that flag enables cause the object files to become much bigger which cctools ld can't always deal with.

domenkozar commented 1 year ago

Looking at https://github.com/agda/agda/blob/master/Agda.cabal#L658, this could be a bug in GHC.

@angerman have you seen this before?

domenkozar commented 1 year ago

I've noticed that GHC doesn't use llvmPackages supplied to provide clang, but it uses whatever stdenv provides. That's why it's using clang_11 in these builds.

sternenseemann commented 1 year ago

That should not matter, as aarch64-darwin doesn’t use the LLVM backend. Under normal circumstances you’d also use whatever clang comes with the XCode SDK.On 20. Jan 2023, at 12:39, Domen Kožar @.***> wrote: I've noticed that GHC doesn't use llvmPackages supplied to provide clang, but it uses whatever stdenv provides. That's why it's using clang_11 in these builds.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***>

domenkozar commented 1 year ago

@sternenseemann any ideas how to tell GHC to pick linker from a newer llvmPackages? I've tried passing a newer stdenv to generic builder, but that doesn't help.

sternenseemann commented 1 year ago

@domenkozar GHC gets no linker from llvmPackages – on macOS it'll use cctools' ld which has nothing to do with lld or LLVM.

wildsebastian commented 1 year ago

I played around with this a little bit more and I have some more information. I am missing a few links there to get the full picture out of this. On a side note, the -foptimise-heavily flag sets these two ghc flags: -fexpose-all-unfoldings and -fspecialise-aggressively. One by itself works, using both causes ld to segfault. More interestingly, when I use /usr/bin/ld it works. I tried this by modifying clang-wrapper-11.1.0/bin/ld in my nix store. It segfaults, when cctools-binutils-darwin-973.0.1/bin/ld is used, which is a symlink to cctools-port-973.0.1/bin/ld. I had a look at both executables with file and there are two differences in the flags. /usr/bin/ld: flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|PIE> cctools-port-973.0.1/bin/ld: flags:<NOUNDEFS|DYLDLINK|TWOLEVEL|WEAK_DEFINES|BINDS_TO_WEAK|PIE> Currently my guess is a bug in https://github.com/tpoechtrager/cctools-port. Does anyone have a suggestion, where it would make sense to dig deeper?

(Sorry for some information that might be obvious to more experienced contributors/users, but this is part of my notes and I do not know about all the implementation detail)

veprbl commented 1 year ago

@wildsebastian Maybe try with Apple's cctools #157628

domenkozar commented 1 year ago

@veprbl it doesn't build on aarch64-darwin for me.

domenkozar commented 1 year ago

I'm currently trying to upgrade cctools to see if that will fix it, otherwise using the apple version might be our only choice (and it needs to be fixed to build on aarch64-darwin).

domenkozar commented 1 year ago

Still segfaults, I sadly don't have time to look into fixing the build of apple cctools.

cidkidnix commented 1 year ago

Currently looking into the build issues with aarch64-darwin of apple-cctools. Also the cctools-port stack trace suggests that it might be stack smashing, though I haven't peered into the code too much

reckenrode commented 1 year ago

I made some progress building cctools-apple on aarch64-darwin, but I ran into some issues.

domenkozar commented 1 year ago

Could we just use SDK 11?

reckenrode commented 1 year ago

Could we just use SDK 11?

Sorry about the delayed response. I missed the notification. Unfortunately, no.

The issues identified above are with the 11.0 SDK. I was able to make some progress by hacking at the SDK in the store, but that’s only good enough to see if it’s possible to build. The 11.0 SDK would need updated to fix those before cctools could be built.

229210 is a PR to add the 13.3 SDK. I’ve mentioned those issues in that PR in the hope they can be addressed for that SDK, then they can be backported back to the 11.0 SDK if/when it’s refactored.

reckenrode commented 1 year ago

As part of my testing for #229786, I attempted to build Agda. It builds now. While I have replaced some of cctools with its LLVM equivalents, I didn’t change the linker. I don’t know why it builds now, but I’ll take it.

$ nix build .#haskellPackages.Agda --out-link result-agda
$ ./result-agda/bin/agda --version
Agda version 2.6.3
$ file ./result-agda/bin/agda
./result-agda/bin/agda: Mach-O 64-bit executable arm64
reckenrode commented 1 year ago

Actually, it looks like it builds on master now, so it’s definitely unrelated. Still, that’s good news.

sternenseemann commented 1 year ago

@reckenrode The crash is not very consistent. It seemed reproducible for a single nixpkgs revision (if I recall correctly), but would come and go over time due to seemingly unrelated changes. So good news is only tentative…

arjunkathuria commented 1 year ago

This is happening with doctest too. using GHC-8.10.7 on aarch64-darwin non-rosetta, trying to run a profiling build. This particular ghc would be using the llvm-backend on ARM chips running natively iirc, would that be involved in this ?

 > Linking dist/build/doctest/doctest ...
 > /nix/store/b0d5xkncz7jn72kkgzsaizpk9170mybf-clang-wrapper-11.1.0/bin/ld: line 269: 47083 Segmentation fault: 11  /nix/store/504j6b0jja97y5w2jfzwpgc4m35b6mm1-cctools-binutils-darwin-973.0.1/bin/ld @<(printf "%q\n" ${extraBefore+"${extraBefore[@]}"} ${params+"${params[@]}"} ${extraAfter+"${extraAfter[@]}"})
 > clang-11: error: linker command failed with exit code 139 (use -v to see invocation)
 > `cc' failed in phase `Linker'. (Exit code: 139)

cc: @sternenseemann

cidkidnix commented 1 year ago

Working with some ios64 stuff I've run into this, what seems to fix it every-time for me is passing the compiler "-fwhole-archive-libs" which should stop GHC itself from (dead?, iirc) stripping the resulting executable. I don't have a regular aarch64-darwin system to actually check if it works for M1/M2 at the moment but it fixed all iOS builds for me!

arjunkathuria commented 1 year ago

Hi @cidkidnix , thanks for taking the time to reply to this. I'll try that and see if that fixes things for me.

john-rodewald commented 1 year ago

-fwhole-archive-libs

I ran into this segfault trying to compile Hasura on aarch64 and this flag was the final missing puzzle piece. Thank you very much!

sternenseemann commented 1 year ago

Interesting, we could consider enabling this by default via haskellPackages.mkDerivation on aarch64-darwin.

Can you compare output size of built libraries with and without this flag? Stripping efficiency would be interesting to know for making this decision.

cidkidnix commented 1 year ago

Interesting, we could consider enabling this by default via haskellPackages.mkDerivation on aarch64-darwin.

I'm interested too, Been meaning to submit a patch upstream to GHC but I haven't found the time

the relevant fix would be something akin to (assuming the output size isn't massively different)

osSubsectionsViaSymbols :: Platform -> Bool
osSubsectionsViaSymbols platform = case (platformArch platform, platformOS platform) of
         (ArchAArch64, OSDarwin) -> False
         (_, OSDarwin) -> True
         (_, _) -> False

in https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Platform.hs#L230