NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.99k stars 14k forks source link

haskell packages with separate bin outputs fail due to a reference cycle on aarch64-darwin #140774

Closed walkah closed 1 year ago

walkah commented 3 years ago

Describe the bug

niv fails to build on aarch64-darwin with the following errors:

Linking dist/build/niv/niv ...
running tests
Running 1 test suites...
Test suite unit: RUNNING...
Test suite unit: PASS
Test suite logged to: dist/test/niv-0.2.19-unit.log
1 of 1 test suites (1 of 1 test cases) passed.
haddockPhase
installing
Installing library in /nix/store/v5p7lliij8f10c920sa8lwnqv8di6007-niv-0.2.19/lib/ghc-8.10.7/aarch64-osx-ghc-8.10.7/niv-0.2.19-HYXnAgZhF2cLmqAJFPpkgw
Installing executable niv in /nix/store/abawbski981061gbsnq0v5374gvsrbq7-niv-0.2.19-bin/bin
Warning: The directory
/nix/store/abawbski981061gbsnq0v5374gvsrbq7-niv-0.2.19-bin/bin is not in the
system search path.
/nix/store/mp5xsfnvdbdyqjmcipyks4nph534jc3q-cctools-binutils-darwin-949.0.1/bin/strip: changes being made to the file will invalidate the code signature in: /nix/store/abawbski981061gbsnq0v5374gvsrbq7-niv-0.2.19-bin/bin/niv
Registering library for niv-0.2.19..
post-installation fixup
strip is /nix/store/vf55lp3k5xgx1vlkysb7pbw76f1wh21l-clang-wrapper-11.1.0/bin/strip
stripping (with command strip and flags -S) in /nix/store/v5p7lliij8f10c920sa8lwnqv8di6007-niv-0.2.19/lib
patching script interpreter paths in /nix/store/v5p7lliij8f10c920sa8lwnqv8di6007-niv-0.2.19
strip is /nix/store/vf55lp3k5xgx1vlkysb7pbw76f1wh21l-clang-wrapper-11.1.0/bin/strip
patching script interpreter paths in /nix/store/hkapz2a124mmzc3x1mw2dqvds3bhkfg4-niv-0.2.19-data
strip is /nix/store/vf55lp3k5xgx1vlkysb7pbw76f1wh21l-clang-wrapper-11.1.0/bin/strip
stripping (with command strip and flags -S) in /nix/store/abawbski981061gbsnq0v5374gvsrbq7-niv-0.2.19-bin/bin
patching script interpreter paths in /nix/store/abawbski981061gbsnq0v5374gvsrbq7-niv-0.2.19-bin
cycle detected in the references of '/nix/store/abawbski981061gbsnq0v5374gvsrbq7-niv-0.2.19-bin' from '/nix/store/v5p7lliij8f10c920sa8lwnqv8di6007-niv-0.2.19'
error: build of '/nix/store/k1jbc0389f58cwwy9xvi9r2xi5fmqdc2-niv-0.2.19.drv' failed

Recent hydra builds also fail: https://hydra.nixos.org/job/nixpkgs/nixpkgs-unstable-aarch64-darwin/niv.aarch64-darwin

Steps To Reproduce

Steps to reproduce the behavior:

  1. nix-shell -p niv

Expected behavior

niv should install and be available.

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"aarch64-darwin"`
 - host os: `Darwin 20.6.0, macOS 11.6`
 - multi-user?: `yes`
 - sandbox: `no`
 - version: `nix-env (Nix) 2.3.15`
 - channels(walkah): `"darwin, home-manager"`
 - channels(root): `"nixpkgs-21.11pre320922.ee084c02040"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixpkgs`

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
sternenseemann commented 3 years ago

See also #140180 for the same issue. Seems like haskell packages with separate bin outputs just don't work on aarch64-darwin for some reason, likely codesigning?!

sternenseemann commented 2 years ago

@NixOS/darwin-maintainers please can someone look into this issue? I literally have no way of debugging this.

starcraft66 commented 2 years ago

I attempted to build niv with enableSeparateDataOutput = false; and still get the same error. I haven't touched haskellPackages before and I'm extremely unfamiliar as to how the outputs are created and how the relate to eachother in a way that creates a loop.

onthestairs commented 2 years ago

Setting enableSeparateBinOutput = false; (bin not data) worked for me, but I'm also quite unfamiliar with how things work so it was a bit of a guess.

sternenseemann commented 2 years ago

Well, that's an obvious workaround, what I'm more interested in is why do bin outputs in haskell packages create cyclical dependencies between outputs only on aarch64-darwin.

toonn commented 2 years ago

Which references are causing the cycles? Is it something like an extra codesigning binary in bin which is also used from out?

cideM commented 2 years ago

I'm seeing this issue in a Nix shell with ghcid and ormolu. I'll look into this a little more in the evening but at least to me it seems like this affects more than just a single package

sternenseemann commented 2 years ago

Yes, every haskell package with a separate bin output is broken at the moment. Would be great to find out what actually causes the reference cycle.

cideM commented 2 years ago

I think we should change the title of the issue to reflect the broader scope. I tried bisecting on my M1 machine earlier but build times are brutal for all the Haskell machinery. I might have to try a different approach later on. I did check if otool -l path/to/ghcid-bin references anything from the ghcid lib output but it did not. Which is strange because then what's responsible for the circular reference? Anyhow, will look into this more later.

hexagonal-sun commented 2 years ago

I can upload the failed build directory of ormolu (a failing haskell package) being built on aarch64-darwin if that helps?

sternenseemann commented 2 years ago

I can upload the failed build directory of ormolu (a failing haskell package) being built on aarch64-darwin if that helps?

This is of limited use because it doesn't really tell you what is wrong in the installed outputs. I'd expect that the references get introduced in fixupPhase which is after installation… I guess you could grep for the value of niv.bin.outPath and nix.out.outPath in the build directory and see if it turns up something interesting.

This nix feature would be useful right about now…

hexagonal-sun commented 2 years ago

I can upload the failed build directory of ormolu (a failing haskell package) being built on aarch64-darwin if that helps?

This is of limited use because it doesn't really tell you what is wrong in the installed outputs. I'd expect that the references get introduced in fixupPhase which is after installation… I guess you could grep for the value of niv.bin.outPath and nix.out.outPath in the build directory and see if it turns up something interesting.

This nix feature would be useful right about now…

If I do: nix show-derivation /nix/store/120xvbzmkglal2zgq43bx8n82cn3q8h5-ghcid-0.8.7.drv I can see the output directories in the store:

{
  "/nix/store/120xvbzmkglal2zgq43bx8n82cn3q8h5-ghcid-0.8.7.drv": {
    "outputs": {
      "bin": {
        "path": "/nix/store/781rfvh4fb9wjz33zr5y1nfli1fv5yzp-ghcid-0.8.7-bin"
      },
      "doc": {
        "path": "/nix/store/27ik61ijjsgjga35g5n021l8j30fqf6g-ghcid-0.8.7-doc"
      },
      "out": {
        "path": "/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7"
      }
    },

[...]

Those directories do have the built binaries in libs in there. Would that be of more use? i.e, would the fixupPhase have occurred at that point?

hexagonal-sun commented 2 years ago

I've been doing a bit more digging on this and after running nix build --debug --keep-failed ".#ghcid", I found the following:

scanning for references for output 'bin' in temp location '/nix/store/781rfvh4fb9wjz33zr5y1nfli1fv5yzp-ghcid-0.8.7-bin'
found reference to 'w8079fmkwpblbcpj0vyaaj49l1ib8q95' at offset '2043'
found reference to 'hchnpr6dwnq47c2m5mm7i65wxx3cdhmi' at offset '2147'
found reference to 'f42zzy8y0fjc2fsfxs4ylpd6wxzgmx04' at offset '2251'
found reference to 'l0rp423b2w8b065w5cdx4s3hanjx464x' at offset '2431'
found reference to '781rfvh4fb9wjz33zr5y1nfli1fv5yzp' at offset '45769'
found reference to 'wihaahmwj1hksyf117dqvi6s249dg23v' at offset '45846' [lib output]
scanning for references for output 'doc' in temp location '/nix/store/27ik61ijjsgjga35g5n021l8j30fqf6g-ghcid-0.8.7-doc'
found reference to 'rffixa56qdb6vrrn85m9434r6nnyz7nc' at offset '2029'
found reference to 'hwzhi4m3mr6y3raq12xhf3wqi36qpf08' at offset '3588'
found reference to 'zh7w4sk7kkry40gp02hry61fhgrfagsw' at offset '5388'
found reference to '43zgrl31xb6brandd5vn729ijfdnq7qj' at offset '8128'
found reference to '781rfvh4fb9wjz33zr5y1nfli1fv5yzp' at offset '11537'
scanning for references for output 'out' in temp location '/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7'
found reference to 'hrwkdw342mxc1y5a0czf5x8zsrn7svp0' at offset '402'
found reference to '781rfvh4fb9wjz33zr5y1nfli1fv5yzp' at offset '1668' [bin output]
found reference to 'wihaahmwj1hksyf117dqvi6s249dg23v' at offset '2944'
found reference to 'w8079fmkwpblbcpj0vyaaj49l1ib8q95' at offset '2319'
found reference to 'hchnpr6dwnq47c2m5mm7i65wxx3cdhmi' at offset '3327'
found reference to 'f42zzy8y0fjc2fsfxs4ylpd6wxzgmx04' at offset '3791'
[...]

I'm not too sure how these references are found, but I presume that it's doing something similar to running strings across the binary and looking for a valid nix store hash. If that's the case. The reference from bin to out:

@platoon:/n/s/7/bin🔒 ❯ strings ghcid | grep store
/nix/store/w8079fmkwpblbcpj0vyaaj49l1ib8q95-libiconv-50/lib/libiconv.dylib
/nix/store/hchnpr6dwnq47c2m5mm7i65wxx3cdhmi-gmp-6.2.1/lib/libgmp.10.dylib
/nix/store/f42zzy8y0fjc2fsfxs4ylpd6wxzgmx04-libffi-3.4.2/lib/libffi.8.dylib
/nix/store/l0rp423b2w8b065w5cdx4s3hanjx464x-apple-framework-CoreFoundation-11.0.0/Library/Frameworks
/nix/store/781rfvh4fb9wjz33zr5y1nfli1fv5yzp-ghcid-0.8.7-bin/bin
[references to out follow]
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/lib/ghc-8.10.7/aarch64-osx-ghc-8.10.7/ghcid-0.8.7-5ZcL75a2atNERQtuLKeyuL-ghcid
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/lib/ghc-8.10.7/aarch64-osx-ghc-8.10.7
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/share/aarch64-osx-ghc-8.10.7/ghcid-0.8.7
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/libexec/aarch64-osx-ghc-8.10.7/ghcid-0.8.7
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/etc

The reference from out to bin:

@platoon:/n/s/w/l/g/aarch64-osx-ghc-8.10.7🔒 ❯ strings libHSghcid-0.8.7-Eq9Lb7GjR4QBjEa2ZCK5Kw-ghc8.10.7.dylib | grep store           
[...]
/nix/store/hrwkdw342mxc1y5a0czf5x8zsrn7svp0-ghc-8.10.7/lib/ghc-8.10.7/unix-2.7.2.2
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/lib/links
/nix/store/l0rp423b2w8b065w5cdx4s3hanjx464x-apple-framework-CoreFoundation-11.0.0/Library/Frameworks
[reference to bin follows]
/nix/store/781rfvh4fb9wjz33zr5y1nfli1fv5yzp-ghcid-0.8.7-bin/bin
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/lib/ghc-8.10.7/aarch64-osx-ghc-8.10.7/ghcid-0.8.7-Eq9Lb7GjR4QBjEa2ZCK5Kw

Having a reference from bin -> out makes sense, the more curious case is the reference from out -> bin. Strangely, it looks as though it's only referencing the bin folder, not the actual ghcid executable.

sternenseemann commented 2 years ago

Actually the bin -> out reference is probably the anomaly since the executable should be statically linked (against the haskell libraries, system deps are linked dynamically) and thus contain no reference to the library.

I wonder how the reference comes into existence…

hexagonal-sun commented 2 years ago

@sternenseemann Indeed, you are correct. I've just checked on my x86_darwin system and there is a reference from out -> bin but no reference from bin -> out, so it looks as though the ghcid binary is the problem. I've uploaded it here for other people to take a look at, but I'll see if I can figure out where the reference is coming from. The bad references can be seen with strings ghcid | grep wihaahmwj1hksyf1.

ghcid-bin-aarch64-darwin.tar.gz

Update: After throwing the binary into ghidra to see where the referencies are coming from , it looks as though they are values of the haskell functions: getLibDir, getDynLibDir, getDataDir, getLibexecDir, getSysconfDir:

Screenshot 2021-12-04 at 08 32 40

These symbols don't exist in the x86_64-darwin binary.

domenkozar commented 2 years ago

Maybe we can just make bin output a no-op on aarch64-darwin until it's fixed?

sternenseemann commented 2 years ago

I'm against this for the simple reason it eliminates any incentive to fix the bug. Separate bin outputs are quite important as they significantly reduce the closure size: Haskell binaries are statically linked, so the download of the libraries plus the entire dependency closure is unnecessary if you only want to execute a tool.

Given Rosetta, there is a decent enough workaround for the failing builds.

domenkozar commented 2 years ago

Separate bin outputs are quite important as they significantly reduce the closure size: Haskell binaries are statically linked, so the download of the libraries plus the entire dependency closure is unnecessary if you only want to execute a tool.

That's the incentive :)

domenkozar commented 2 years ago

@sternenseemann Indeed, you are correct. I've just checked on my x86_darwin system and there is a reference from out -> bin but no reference from bin -> out, so it looks as though the ghcid binary is the problem. I've uploaded it here for other people to take a look at, but I'll see if I can figure out where the reference is coming from. The bad references can be seen with strings ghcid | grep wihaahmwj1hksyf1.

ghcid-bin-aarch64-darwin.tar.gz

Update: After throwing the binary into ghidra to see where the referencies are coming from , it looks as though they are values of the haskell functions: getLibDir, getDynLibDir, getDataDir, getLibexecDir, getSysconfDir:

Screenshot 2021-12-04 at 08 32 40

These symbols don't exist in the x86_64-darwin binary.

cc @angerman is this a bug in GHC dead tree elimination for arm/darwin?

hexagonal-sun commented 2 years ago

Note: ghcid does include the paths module: https://github.com/ndmitchell/ghcid/blob/b18ad1643f753f39e924909ecd957cb6b5a5fa89/ghcid.cabal#L75, although I can't see where it uses it. It looks like most packages use Paths to get the version of the package.

Same for ormolu: https://github.com/tweag/ormolu/blob/2c52cff9ea44d2f4214403056411035044d567ee/ormolu.cabal#L111

Update: I've confirmed that the Paths module is the issue. I've successfully built ormolu after applying the following patch:

diff --git a/app/Main.hs b/app/Main.hs
index 3d3498c..4f003a8 100644
--- a/app/Main.hs
+++ b/app/Main.hs
@@ -11,7 +11,7 @@ import Data.Bool (bool)
 import Data.List (intercalate, sort)
 import Data.Maybe (mapMaybe)
 import qualified Data.Text.IO as TIO
-import Data.Version (showVersion)
+import Data.Version (showVersion, makeVersion)
 import Development.GitRev
 import Options.Applicative
 import Ormolu
@@ -19,7 +19,6 @@ import Ormolu.Diff.Text (diffText, printTextDiff)
 import Ormolu.Parser (manualExts)
 import Ormolu.Terminal
 import Ormolu.Utils (showOutputable)
-import Paths_ormolu (version)
 import System.Exit (ExitCode (..), exitWith)
 import qualified System.FilePath as FP
 import System.IO (hPutStrLn, stderr)
@@ -139,7 +138,7 @@ optsParserInfo =
         "\n"
         [ unwords
             [ "ormolu",
-              showVersion version,
+              showVersion (makeVersion [1,0]),
               $gitBranch,
               $gitHash
             ],
diff --git a/ormolu.cabal b/ormolu.cabal
index 6b26b3b..6d54a5b 100644
--- a/ormolu.cabal
+++ b/ormolu.cabal
@@ -145,7 +145,6 @@ library
 executable ormolu
     main-is:          Main.hs
     hs-source-dirs:   app
-    other-modules:    Paths_ormolu
     default-language: Haskell2010
     build-depends:
         base >=4.12 && <5.0,
cideM commented 2 years ago

I just wanted to start debugging this using ghcid and using this flake

{
  description = "Nix Flake template using the 'nixpkgs-unstable' branch and 'flake-utils'";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = import nixpkgs {
          overlays = [
            (self: super: {
              haskellPackages = super.haskellPackages // {
                ghcid = super.haskellPackages.ghcid.overrideAttrs (old: {
                });
              };
            })
          ];
          inherit system;
        };
      in
      {
        packages = flake-utils.lib.flattenTree {
          ghcid = pkgs.haskellPackages.ghcid;
        };
        devShell = pkgs.mkShell {
          buildInputs = with pkgs; [
            coreutils
            moreutils
            jq
          ];
        };
      }
    );
}

I'm now getting

$ nix build .#ghcid
building '/nix/store/jqfx8z69709v0mh0qg4ybiahiwzv5zlb-ghcid-0.8.7.drv'...
cycle detected in the references of '/nix/store/d8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc' from '/nix/store/d8vx9ix1iygqnvwdwqfvqbplfh5c1lxw-ghcid-0.8.7'
error: build of '/nix/store/jqfx8z69709v0mh0qg4ybiahiwzv5zlb-ghcid-0.8.7.drv' failed

on my M1 machine. Notice that the error is talking about the doc output.

$ rg 'd8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc' '/nix/store/d8vx9ix1iygqnvwdwqfvqbplfh5c1lxw-ghcid-0.8.7'

/nix/store/d8vx9ix1iygqnvwdwqfvqbplfh5c1lxw-ghcid-0.8.7/lib/ghc-8.10.7/package.conf.d/ghcid-0.8.7-1giAseyHcduH3vBLKcJ4Nu.conf
115:    /nix/store/d8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc/share/doc/ghcid-0.8.7/html/ghcid.haddock
118:    /nix/store/d8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc/share/doc/ghcid-0.8.7/html

more specifically from the .conf file in question:

 114   │ haddock-interfaces:
 115   │     /nix/store/d8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc/share/doc/ghcid-0.8.7/html/ghcid.haddock
 116   │
 117   │ haddock-html:
 118   │     /nix/store/d8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc/share/doc/ghcid-0.8.7/html

I get the same behavior when building the package directly from Nixpkgs at b6bf1ca717b0ead86769102c2e94d591cc45ee9b

shinzui commented 2 years ago

Is there any workaround?

srid commented 2 years ago

Is there any workaround?

https://github.com/srid/haskell-template/blob/6ecc41c90bc063a4649694533090982bfa5ca47b/flake.nix#L19-L30

shinzui commented 2 years ago

Thank you, @srid.

domenkozar commented 2 years ago

@sternenseemann did you reconsider fixing this with a workaround? Many basic packages like niv are failing for aarch64-darwin users and having a big closure is a lot better than just plain build failure.

sternenseemann commented 2 years ago

My point still stands, but I suppose I could live with a workaround, as there doesn't seem to be anyone with the time to find a fix. It'd be great if someone could confirm this is an issue with GHC's unused symbol elimination and report this to upstream, though.

hexagonal-sun commented 2 years ago

I did mention this on GHC's IRC channel and it looks like it's a problem to do with the LLVM back-end. once we move to NCG, it looks like this problem will be fixed. Another solution that was suggested is to use Cabal's relocatable option. That prevents paths from being set in the phantom Pathsmodule. I'm not too sure how to enable this option in the nix haskell machinery though?

JJJollyjim commented 2 years ago

It can be done with appendConfigureFlag "--enable-relocatable" in configuration-common.nix, but it doesn't seem to work:

CallStack (from HasCallStack):
  $, called at libraries/Cabal/Cabal/Distribution/Simple/Configure.hs:1993:11 in Cabal-3.2.1.0:Distribution.Simple.Configure
  checkRelocatable, called at libraries/Cabal/Cabal/Distribution/Simple/Configure.hs:783:17 in Cabal-3.2.1.0:Distribution.Simple.Configure
  configure, called at libraries/Cabal/Cabal/Distribution/Simple.hs:625:20 in Cabal-3.2.1.0:Distribution.Simple
  confHook, called at libraries/Cabal/Cabal/Distribution/Simple/UserHooks.hs:65:5 in Cabal-3.2.1.0:Distribution.Simple.UserHooks
  configureAction, called at libraries/Cabal/Cabal/Distribution/Simple.hs:180:19 in Cabal-3.2.1.0:Distribution.Simple
  defaultMainHelper, called at libraries/Cabal/Cabal/Distribution/Simple.hs:116:27 in Cabal-3.2.1.0:Distribution.Simple
  defaultMain, called at Setup.hs:2:8 in main:Main
Setup: Installation directories are not prefix_relative:
InstallDirs {prefix =
"/nix/store/y9j3v034yq8cjiqibkizb99vrv1ch4gf-niv-0.2.19", bindir =
"/nix/store/s5mfn44ks2bfln26syhxlr3w1xrs4j9v-niv-0.2.19-bin/bin", libdir =
Setup: internal error InstallDirs.libsubdir
CallStack (from HasCallStack):
  error, called at libraries/Cabal/Cabal/Distribution/Simple/InstallDirs.hs:142:18 in Cabal-3.2.1.0:Distribution.Simple.InstallDirs

It doesn't like the fact that niv-0.2.19-bin/bin is not inside niv-0.2.19/.

JJJollyjim commented 2 years ago

I think that relocatable only makes the paths relative, which is very much not a valid solution for our case. I think the "correct" option would be to patch Cabal such that the non-bin path functions are never generated in the paths module on split builds. This would cause a compile-time error if the application actually uses them (in which case, the correct solution is to disable split building for that package), otherwise it would have no effect, and would solve our issue here, regardless of which optimisations do or do not occur.

The path-generating code to patch is in https://github.com/haskell/cabal/blob/master/Cabal/src/Distribution/Simple/Build/PathsModule.hs (or https://github.com/haskell/cabal/blob/master/Cabal/src/Distribution/Simple/Build/PathsModule/Z.hs).

angerman commented 2 years ago

I'm fairly certain we should add a flag to cabal to not generate Path's modules if not explicitly requested. It's an absolute misfeature, and one of the few places that completely break relocatability.

We'd likely want to either outright disable:

https://github.com/haskell/cabal/blob/c66a1260f86c9c2f16dc319604f766fd8f756045/Cabal/src/Distribution/Simple/Build.hs#L758-L782

or put it behind some flag --with-paths-module.

angerman commented 2 years ago

@sternenseemann Indeed, you are correct. I've just checked on my x86_darwin system and there is a reference from out -> bin but no reference from bin -> out, so it looks as though the ghcid binary is the problem. I've uploaded it here for other people to take a look at, but I'll see if I can figure out where the reference is coming from. The bad references can be seen with strings ghcid | grep wihaahmwj1hksyf1. ghcid-bin-aarch64-darwin.tar.gz Update: After throwing the binary into ghidra to see where the referencies are coming from , it looks as though they are values of the haskell functions: getLibDir, getDynLibDir, getDataDir, getLibexecDir, getSysconfDir:

Screenshot 2021-12-04 at 08 32 40

These symbols don't exist in the x86_64-darwin binary.

cc @angerman is this a bug in GHC dead tree elimination for arm/darwin?

I don't know what's generating _bytes suffixes. But the LLVM backend on darwin does not have dead code elimination. It's imply can't as darwins dead code elimination is done through the .sections_via_symbols directive. However we can't get LLVM to emit proper code to retain prefix data, which we use due to Tables Next To Code in GHC. This is one of the mismatches between what GHC needs and what LLVM provides. We could probably try to fix this up by hand-injecting extra symbols to tie prefix data to their functions, but mangling assembly is bad enough and now starting to do this across multiple architectures is even worse.

So yes, this will get better with the aarch64 NCG, but it's a generic issue on darwin if you use the LLVM codegen. The x86_64-darwin LLVM codegen will have the same issue.

domenkozar commented 2 years ago

So let's disable bin splitting for aach64-darwin and revert that once we have GHC 9.2 as the default?

JJJollyjim commented 2 years ago

@angerman the issue here is that all three problem packages do actively use the paths module! The thing is, they use it solely to get their own version number, not to access any problematic paths.

(This is why I suggested patching out generation of the specific variables)

angerman commented 2 years ago

@JJJollyjim yes, I fully agree that patching it out is the correct solution. But the better solution would be to outright prevent packages from doing this kind of garbage.

angerman commented 2 years ago

Let me reiterate my opinion: the Paths_ module was and is a bad idea and should never have been conceived. It should be outright removed and packages need to be fixed properly.

JJJollyjim commented 2 years ago

What's the "fixed" way to get your own version number?

angerman commented 2 years ago

We could probably stick it into cabal_macros.h?

sternenseemann commented 2 years ago

Why not? Referencing certain paths is completely fine, e. g. the data output is often necessary to reference. That unused values end up in the binary is the problem.

In any case, seems like we are stuck with this problem until 9.2.1 (which is going to be quite some time for us).

angerman commented 2 years ago

9.2.2, 9.2.1 is a dud.

angerman commented 2 years ago

if you need to access data, build a wrapper script around your executable, and configure your DATA path through some env var. If you need to look it up relative to your executable, we can do this today on most operating systems. GHC has the same issue, it needs to lookup it lib path.

Baking in paths breaks relocatability. It's a bad, bad, idea. The only case where I have some sympathy for this is again operating systems that can't find their executables absolute path. If you are a distribution that ends up splitting bin and lib or share apart anyway, you'll need to pass that information to the executable one way or the other. A common way to do this is to wrap the executable in a shell script with an ENV var telling the executable where to find it's data dir.

Again, the Paths_ module is a terrible solution.

hexagonal-sun commented 2 years ago

I've been looking at this today and I've got a patch that prevents cabal creating the problematic Path symbols above. GHC built fine and I can confirm this fixes the build for ghcid. However the following packages couldn't be built:

While this does fix the build for ghcid, it does break the build for other packages. The only other thing that I could think of doing is creating a dummy value for getDataDir so that it does exist, but will fail when it's actually used.

sternenseemann commented 2 years ago

It's crucial that data dir works and the reference incurred due to it is unproblematic.

srid commented 2 years ago

Shouldn't the contents of data-dir be made its own derivation, so the bin derivation can reference it (and only it)?

sternenseemann commented 2 years ago

They are installed to a separate output.

hexagonal-sun commented 2 years ago

It's crucial that data dir works and the reference incurred due to it is unproblematic.

In which case, I think we're stuck since the data dir is one the symbols creating a cycle.

sternenseemann commented 2 years ago

Not necessarily, data files are installed to a different output, so I guess we could try something that compares the string to the prefix and only emits it if they difer (or rather only emits paths if they are different from the prefix).

hexagonal-sun commented 2 years ago

I've got a partial fix at #154046.

angerman commented 2 years ago

Guess talking about this in detail on matrix nix Haskell channel and not brining that discussion back left out some details.

@hexagonal-sun the approach is good. You can keep all directives, just modify the mkEnvOrReloc call to just emit the EnvVar lookup. Then patch the install phase to remit a wrapper around the binary setting the absolute path for the data dir via the env var.

sternenseemann commented 2 years ago

I would not like that, it's a far to invasive change just to work around that GHC is missing dead tree elimination in the default backend for a single platform. This would also require to add a ton of new stuff to the generic builder and we still wouldn't eliminate the reference cycle for free.

angerman commented 2 years ago

@sternenseemann just to be clear here. Ghc doesn't do any dead code elimination by itself. It relies on the linker to do so and tries to produce code that's kinda dead-strippable. But there is no major logic in GHC to do DCE outside of split-sections for ELF and subsections_via_symbols for Mach-O.

The right fix imho is to fix cabal to not be an utterly idiotic tool with its stupid Paths_ module.

Relying on DCE to fix your broken code is not the right strategy. Fix the broken code from the get go would be.