Closed walkah closed 1 year ago
See also #140180 for the same issue. Seems like haskell packages with separate bin
outputs just don't work on aarch64-darwin
for some reason, likely codesigning?!
@NixOS/darwin-maintainers please can someone look into this issue? I literally have no way of debugging this.
I attempted to build niv with enableSeparateDataOutput = false;
and still get the same error. I haven't touched haskellPackages
before and I'm extremely unfamiliar as to how the outputs are created and how the relate to eachother in a way that creates a loop.
Setting enableSeparateBinOutput = false;
(bin
not data
) worked for me, but I'm also quite unfamiliar with how things work so it was a bit of a guess.
Well, that's an obvious workaround, what I'm more interested in is why do bin
outputs in haskell packages create cyclical dependencies between outputs only on aarch64-darwin
.
Which references are causing the cycles? Is it something like an extra codesigning binary in bin which is also used from out?
I'm seeing this issue in a Nix shell with ghcid
and ormolu
. I'll look into this a little more in the evening but at least to me it seems like this affects more than just a single package
Yes, every haskell package with a separate bin
output is broken at the moment. Would be great to find out what actually causes the reference cycle.
I think we should change the title of the issue to reflect the broader scope. I tried bisecting on my M1 machine earlier but build times are brutal for all the Haskell machinery. I might have to try a different approach later on. I did check if otool -l path/to/ghcid-bin
references anything from the ghcid
lib output but it did not. Which is strange because then what's responsible for the circular reference? Anyhow, will look into this more later.
I can upload the failed build directory of ormolu (a failing haskell package) being built on aarch64-darwin if that helps?
I can upload the failed build directory of ormolu (a failing haskell package) being built on aarch64-darwin if that helps?
This is of limited use because it doesn't really tell you what is wrong in the installed outputs. I'd expect that the references get introduced in fixupPhase
which is after installation… I guess you could grep for the value of niv.bin.outPath
and nix.out.outPath
in the build directory and see if it turns up something interesting.
This nix feature would be useful right about now…
I can upload the failed build directory of ormolu (a failing haskell package) being built on aarch64-darwin if that helps?
This is of limited use because it doesn't really tell you what is wrong in the installed outputs. I'd expect that the references get introduced in
fixupPhase
which is after installation… I guess you could grep for the value ofniv.bin.outPath
andnix.out.outPath
in the build directory and see if it turns up something interesting.This nix feature would be useful right about now…
If I do: nix show-derivation /nix/store/120xvbzmkglal2zgq43bx8n82cn3q8h5-ghcid-0.8.7.drv
I can see the output directories in the store:
{
"/nix/store/120xvbzmkglal2zgq43bx8n82cn3q8h5-ghcid-0.8.7.drv": {
"outputs": {
"bin": {
"path": "/nix/store/781rfvh4fb9wjz33zr5y1nfli1fv5yzp-ghcid-0.8.7-bin"
},
"doc": {
"path": "/nix/store/27ik61ijjsgjga35g5n021l8j30fqf6g-ghcid-0.8.7-doc"
},
"out": {
"path": "/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7"
}
},
[...]
Those directories do have the built binaries in libs in there. Would that be of more use? i.e, would the fixupPhase
have occurred at that point?
I've been doing a bit more digging on this and after running nix build --debug --keep-failed ".#ghcid"
, I found the following:
scanning for references for output 'bin' in temp location '/nix/store/781rfvh4fb9wjz33zr5y1nfli1fv5yzp-ghcid-0.8.7-bin'
found reference to 'w8079fmkwpblbcpj0vyaaj49l1ib8q95' at offset '2043'
found reference to 'hchnpr6dwnq47c2m5mm7i65wxx3cdhmi' at offset '2147'
found reference to 'f42zzy8y0fjc2fsfxs4ylpd6wxzgmx04' at offset '2251'
found reference to 'l0rp423b2w8b065w5cdx4s3hanjx464x' at offset '2431'
found reference to '781rfvh4fb9wjz33zr5y1nfli1fv5yzp' at offset '45769'
found reference to 'wihaahmwj1hksyf117dqvi6s249dg23v' at offset '45846' [lib output]
scanning for references for output 'doc' in temp location '/nix/store/27ik61ijjsgjga35g5n021l8j30fqf6g-ghcid-0.8.7-doc'
found reference to 'rffixa56qdb6vrrn85m9434r6nnyz7nc' at offset '2029'
found reference to 'hwzhi4m3mr6y3raq12xhf3wqi36qpf08' at offset '3588'
found reference to 'zh7w4sk7kkry40gp02hry61fhgrfagsw' at offset '5388'
found reference to '43zgrl31xb6brandd5vn729ijfdnq7qj' at offset '8128'
found reference to '781rfvh4fb9wjz33zr5y1nfli1fv5yzp' at offset '11537'
scanning for references for output 'out' in temp location '/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7'
found reference to 'hrwkdw342mxc1y5a0czf5x8zsrn7svp0' at offset '402'
found reference to '781rfvh4fb9wjz33zr5y1nfli1fv5yzp' at offset '1668' [bin output]
found reference to 'wihaahmwj1hksyf117dqvi6s249dg23v' at offset '2944'
found reference to 'w8079fmkwpblbcpj0vyaaj49l1ib8q95' at offset '2319'
found reference to 'hchnpr6dwnq47c2m5mm7i65wxx3cdhmi' at offset '3327'
found reference to 'f42zzy8y0fjc2fsfxs4ylpd6wxzgmx04' at offset '3791'
[...]
I'm not too sure how these references are found, but I presume that it's doing something similar to running strings
across the binary and looking for a valid nix store hash. If that's the case. The reference from bin
to out
:
@platoon:/n/s/7/bin🔒 ❯ strings ghcid | grep store
/nix/store/w8079fmkwpblbcpj0vyaaj49l1ib8q95-libiconv-50/lib/libiconv.dylib
/nix/store/hchnpr6dwnq47c2m5mm7i65wxx3cdhmi-gmp-6.2.1/lib/libgmp.10.dylib
/nix/store/f42zzy8y0fjc2fsfxs4ylpd6wxzgmx04-libffi-3.4.2/lib/libffi.8.dylib
/nix/store/l0rp423b2w8b065w5cdx4s3hanjx464x-apple-framework-CoreFoundation-11.0.0/Library/Frameworks
/nix/store/781rfvh4fb9wjz33zr5y1nfli1fv5yzp-ghcid-0.8.7-bin/bin
[references to out follow]
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/lib/ghc-8.10.7/aarch64-osx-ghc-8.10.7/ghcid-0.8.7-5ZcL75a2atNERQtuLKeyuL-ghcid
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/lib/ghc-8.10.7/aarch64-osx-ghc-8.10.7
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/share/aarch64-osx-ghc-8.10.7/ghcid-0.8.7
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/libexec/aarch64-osx-ghc-8.10.7/ghcid-0.8.7
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/etc
The reference from out
to bin
:
@platoon:/n/s/w/l/g/aarch64-osx-ghc-8.10.7🔒 ❯ strings libHSghcid-0.8.7-Eq9Lb7GjR4QBjEa2ZCK5Kw-ghc8.10.7.dylib | grep store
[...]
/nix/store/hrwkdw342mxc1y5a0czf5x8zsrn7svp0-ghc-8.10.7/lib/ghc-8.10.7/unix-2.7.2.2
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/lib/links
/nix/store/l0rp423b2w8b065w5cdx4s3hanjx464x-apple-framework-CoreFoundation-11.0.0/Library/Frameworks
[reference to bin follows]
/nix/store/781rfvh4fb9wjz33zr5y1nfli1fv5yzp-ghcid-0.8.7-bin/bin
/nix/store/wihaahmwj1hksyf117dqvi6s249dg23v-ghcid-0.8.7/lib/ghc-8.10.7/aarch64-osx-ghc-8.10.7/ghcid-0.8.7-Eq9Lb7GjR4QBjEa2ZCK5Kw
Having a reference from bin
-> out
makes sense, the more curious case is the reference from out
-> bin
. Strangely, it looks as though it's only referencing the bin
folder, not the actual ghcid
executable.
Actually the bin
-> out
reference is probably the anomaly since the executable should be statically linked (against the haskell libraries, system deps are linked dynamically) and thus contain no reference to the library.
I wonder how the reference comes into existence…
@sternenseemann Indeed, you are correct. I've just checked on my x86_darwin system and there is a reference from out
-> bin
but no reference from bin
-> out
, so it looks as though the ghcid
binary is the problem. I've uploaded it here for other people to take a look at, but I'll see if I can figure out where the reference is coming from. The bad references can be seen with strings ghcid | grep wihaahmwj1hksyf1
.
ghcid-bin-aarch64-darwin.tar.gz
Update:
After throwing the binary into ghidra to see where the referencies are coming from , it looks as though they are values of the haskell functions: getLibDir, getDynLibDir, getDataDir, getLibexecDir, getSysconfDir
:
These symbols don't exist in the x86_64-darwin
binary.
Maybe we can just make bin output a no-op on aarch64-darwin until it's fixed?
I'm against this for the simple reason it eliminates any incentive to fix the bug. Separate bin
outputs are quite important as they significantly reduce the closure size: Haskell binaries are statically linked, so the download of the libraries plus the entire dependency closure is unnecessary if you only want to execute a tool.
Given Rosetta, there is a decent enough workaround for the failing builds.
Separate
bin
outputs are quite important as they significantly reduce the closure size: Haskell binaries are statically linked, so the download of the libraries plus the entire dependency closure is unnecessary if you only want to execute a tool.
That's the incentive :)
@sternenseemann Indeed, you are correct. I've just checked on my x86_darwin system and there is a reference from
out
->bin
but no reference frombin
->out
, so it looks as though theghcid
binary is the problem. I've uploaded it here for other people to take a look at, but I'll see if I can figure out where the reference is coming from. The bad references can be seen withstrings ghcid | grep wihaahmwj1hksyf1
.ghcid-bin-aarch64-darwin.tar.gz
Update: After throwing the binary into ghidra to see where the referencies are coming from , it looks as though they are values of the haskell functions:
getLibDir, getDynLibDir, getDataDir, getLibexecDir, getSysconfDir
:These symbols don't exist in the
x86_64-darwin
binary.
cc @angerman is this a bug in GHC dead tree elimination for arm/darwin?
Note: ghcid
does include the paths module: https://github.com/ndmitchell/ghcid/blob/b18ad1643f753f39e924909ecd957cb6b5a5fa89/ghcid.cabal#L75, although I can't see where it uses it. It looks like most packages use Paths
to get the version of the package.
Same for ormolu
: https://github.com/tweag/ormolu/blob/2c52cff9ea44d2f4214403056411035044d567ee/ormolu.cabal#L111
Update: I've confirmed that the Paths
module is the issue. I've successfully built ormolu
after applying the following patch:
diff --git a/app/Main.hs b/app/Main.hs
index 3d3498c..4f003a8 100644
--- a/app/Main.hs
+++ b/app/Main.hs
@@ -11,7 +11,7 @@ import Data.Bool (bool)
import Data.List (intercalate, sort)
import Data.Maybe (mapMaybe)
import qualified Data.Text.IO as TIO
-import Data.Version (showVersion)
+import Data.Version (showVersion, makeVersion)
import Development.GitRev
import Options.Applicative
import Ormolu
@@ -19,7 +19,6 @@ import Ormolu.Diff.Text (diffText, printTextDiff)
import Ormolu.Parser (manualExts)
import Ormolu.Terminal
import Ormolu.Utils (showOutputable)
-import Paths_ormolu (version)
import System.Exit (ExitCode (..), exitWith)
import qualified System.FilePath as FP
import System.IO (hPutStrLn, stderr)
@@ -139,7 +138,7 @@ optsParserInfo =
"\n"
[ unwords
[ "ormolu",
- showVersion version,
+ showVersion (makeVersion [1,0]),
$gitBranch,
$gitHash
],
diff --git a/ormolu.cabal b/ormolu.cabal
index 6b26b3b..6d54a5b 100644
--- a/ormolu.cabal
+++ b/ormolu.cabal
@@ -145,7 +145,6 @@ library
executable ormolu
main-is: Main.hs
hs-source-dirs: app
- other-modules: Paths_ormolu
default-language: Haskell2010
build-depends:
base >=4.12 && <5.0,
I just wanted to start debugging this using ghcid
and using this flake
{
description = "Nix Flake template using the 'nixpkgs-unstable' branch and 'flake-utils'";
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-unstable";
flake-utils.url = "github:numtide/flake-utils";
};
outputs = { self, nixpkgs, flake-utils }:
flake-utils.lib.eachDefaultSystem (system:
let
pkgs = import nixpkgs {
overlays = [
(self: super: {
haskellPackages = super.haskellPackages // {
ghcid = super.haskellPackages.ghcid.overrideAttrs (old: {
});
};
})
];
inherit system;
};
in
{
packages = flake-utils.lib.flattenTree {
ghcid = pkgs.haskellPackages.ghcid;
};
devShell = pkgs.mkShell {
buildInputs = with pkgs; [
coreutils
moreutils
jq
];
};
}
);
}
I'm now getting
$ nix build .#ghcid
building '/nix/store/jqfx8z69709v0mh0qg4ybiahiwzv5zlb-ghcid-0.8.7.drv'...
cycle detected in the references of '/nix/store/d8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc' from '/nix/store/d8vx9ix1iygqnvwdwqfvqbplfh5c1lxw-ghcid-0.8.7'
error: build of '/nix/store/jqfx8z69709v0mh0qg4ybiahiwzv5zlb-ghcid-0.8.7.drv' failed
on my M1 machine. Notice that the error is talking about the doc
output.
$ rg 'd8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc' '/nix/store/d8vx9ix1iygqnvwdwqfvqbplfh5c1lxw-ghcid-0.8.7'
/nix/store/d8vx9ix1iygqnvwdwqfvqbplfh5c1lxw-ghcid-0.8.7/lib/ghc-8.10.7/package.conf.d/ghcid-0.8.7-1giAseyHcduH3vBLKcJ4Nu.conf
115: /nix/store/d8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc/share/doc/ghcid-0.8.7/html/ghcid.haddock
118: /nix/store/d8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc/share/doc/ghcid-0.8.7/html
more specifically from the .conf
file in question:
114 │ haddock-interfaces:
115 │ /nix/store/d8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc/share/doc/ghcid-0.8.7/html/ghcid.haddock
116 │
117 │ haddock-html:
118 │ /nix/store/d8avrsjkash30pwfcv7imz4601zh08dv-ghcid-0.8.7-doc/share/doc/ghcid-0.8.7/html
I get the same behavior when building the package directly from Nixpkgs at b6bf1ca717b0ead86769102c2e94d591cc45ee9b
Is there any workaround?
Thank you, @srid.
@sternenseemann did you reconsider fixing this with a workaround? Many basic packages like niv are failing for aarch64-darwin users and having a big closure is a lot better than just plain build failure.
My point still stands, but I suppose I could live with a workaround, as there doesn't seem to be anyone with the time to find a fix. It'd be great if someone could confirm this is an issue with GHC's unused symbol elimination and report this to upstream, though.
I did mention this on GHC's IRC channel and it looks like it's a problem to do with the LLVM back-end. once we move to NCG, it looks like this problem will be fixed. Another solution that was suggested is to use Cabal's relocatable option. That prevents paths from being set in the phantom Paths
module. I'm not too sure how to enable this option in the nix haskell machinery though?
It can be done with appendConfigureFlag "--enable-relocatable"
in configuration-common.nix, but it doesn't seem to work:
CallStack (from HasCallStack):
$, called at libraries/Cabal/Cabal/Distribution/Simple/Configure.hs:1993:11 in Cabal-3.2.1.0:Distribution.Simple.Configure
checkRelocatable, called at libraries/Cabal/Cabal/Distribution/Simple/Configure.hs:783:17 in Cabal-3.2.1.0:Distribution.Simple.Configure
configure, called at libraries/Cabal/Cabal/Distribution/Simple.hs:625:20 in Cabal-3.2.1.0:Distribution.Simple
confHook, called at libraries/Cabal/Cabal/Distribution/Simple/UserHooks.hs:65:5 in Cabal-3.2.1.0:Distribution.Simple.UserHooks
configureAction, called at libraries/Cabal/Cabal/Distribution/Simple.hs:180:19 in Cabal-3.2.1.0:Distribution.Simple
defaultMainHelper, called at libraries/Cabal/Cabal/Distribution/Simple.hs:116:27 in Cabal-3.2.1.0:Distribution.Simple
defaultMain, called at Setup.hs:2:8 in main:Main
Setup: Installation directories are not prefix_relative:
InstallDirs {prefix =
"/nix/store/y9j3v034yq8cjiqibkizb99vrv1ch4gf-niv-0.2.19", bindir =
"/nix/store/s5mfn44ks2bfln26syhxlr3w1xrs4j9v-niv-0.2.19-bin/bin", libdir =
Setup: internal error InstallDirs.libsubdir
CallStack (from HasCallStack):
error, called at libraries/Cabal/Cabal/Distribution/Simple/InstallDirs.hs:142:18 in Cabal-3.2.1.0:Distribution.Simple.InstallDirs
It doesn't like the fact that niv-0.2.19-bin/bin is not inside niv-0.2.19/.
I think that relocatable only makes the paths relative, which is very much not a valid solution for our case. I think the "correct" option would be to patch Cabal such that the non-bin path functions are never generated in the paths module on split builds. This would cause a compile-time error if the application actually uses them (in which case, the correct solution is to disable split building for that package), otherwise it would have no effect, and would solve our issue here, regardless of which optimisations do or do not occur.
The path-generating code to patch is in https://github.com/haskell/cabal/blob/master/Cabal/src/Distribution/Simple/Build/PathsModule.hs (or https://github.com/haskell/cabal/blob/master/Cabal/src/Distribution/Simple/Build/PathsModule/Z.hs).
I'm fairly certain we should add a flag to cabal
to not generate Path's modules if not explicitly requested. It's an absolute misfeature, and one of the few places that completely break relocatability.
We'd likely want to either outright disable:
or put it behind some flag --with-paths-module
.
@sternenseemann Indeed, you are correct. I've just checked on my x86_darwin system and there is a reference from
out
->bin
but no reference frombin
->out
, so it looks as though theghcid
binary is the problem. I've uploaded it here for other people to take a look at, but I'll see if I can figure out where the reference is coming from. The bad references can be seen withstrings ghcid | grep wihaahmwj1hksyf1
. ghcid-bin-aarch64-darwin.tar.gz Update: After throwing the binary into ghidra to see where the referencies are coming from , it looks as though they are values of the haskell functions:getLibDir, getDynLibDir, getDataDir, getLibexecDir, getSysconfDir
:These symbols don't exist in the
x86_64-darwin
binary.cc @angerman is this a bug in GHC dead tree elimination for arm/darwin?
I don't know what's generating _bytes
suffixes. But the LLVM backend on darwin does not have dead code elimination. It's imply can't as darwins dead code elimination is done through the .sections_via_symbols
directive. However we can't get LLVM to emit proper code to retain prefix data, which we use due to Tables Next To Code in GHC. This is one of the mismatches between what GHC needs and what LLVM provides. We could probably try to fix this up by hand-injecting extra symbols to tie prefix data to their functions, but mangling assembly is bad enough and now starting to do this across multiple architectures is even worse.
So yes, this will get better with the aarch64 NCG, but it's a generic issue on darwin if you use the LLVM codegen. The x86_64-darwin LLVM codegen will have the same issue.
So let's disable bin splitting for aach64-darwin and revert that once we have GHC 9.2 as the default?
@angerman the issue here is that all three problem packages do actively use the paths module! The thing is, they use it solely to get their own version number, not to access any problematic paths.
(This is why I suggested patching out generation of the specific variables)
@JJJollyjim yes, I fully agree that patching it out is the correct solution. But the better solution would be to outright prevent packages from doing this kind of garbage.
Let me reiterate my opinion: the Paths_
module was and is a bad idea and should never have been conceived. It should be outright removed and packages need to be fixed properly.
What's the "fixed" way to get your own version number?
We could probably stick it into cabal_macros.h
?
Why not? Referencing certain paths is completely fine, e. g. the data
output is often necessary to reference. That unused values end up in the binary is the problem.
In any case, seems like we are stuck with this problem until 9.2.1 (which is going to be quite some time for us).
9.2.2, 9.2.1 is a dud.
if you need to access data
, build a wrapper script around your executable, and configure your DATA
path through some env var. If you need to look it up relative to your executable, we can do this today on most operating systems. GHC has the same issue, it needs to lookup it lib
path.
Baking in paths breaks relocatability. It's a bad, bad, idea. The only case where I have some sympathy for this is again operating systems that can't find their executables absolute path. If you are a distribution that ends up splitting bin and lib or share apart anyway, you'll need to pass that information to the executable one way or the other. A common way to do this is to wrap the executable in a shell script with an ENV var telling the executable where to find it's data dir.
Again, the Paths_
module is a terrible solution.
I've been looking at this today and I've got a patch that prevents cabal creating the problematic Path
symbols above. GHC built fine and I can confirm this fixes the build for ghcid
. However the following packages couldn't be built:
niv
- This fails on building crypto-api-tests
as it has a data-files
directive in the cabal file. Since we've removed the generation of getDataDir
, this causes a compilation failure with getDataFileName
.ormolu
- This fails on building happy
for the same reason above. In this case, it's when happy
is trying to emit a template.While this does fix the build for ghcid
, it does break the build for other packages. The only other thing that I could think of doing is creating a dummy value for getDataDir
so that it does exist, but will fail when it's actually used.
It's crucial that data dir works and the reference incurred due to it is unproblematic.
Shouldn't the contents of data-dir
be made its own derivation, so the bin
derivation can reference it (and only it)?
They are installed to a separate output.
It's crucial that data dir works and the reference incurred due to it is unproblematic.
In which case, I think we're stuck since the data dir is one the symbols creating a cycle.
Not necessarily, data files are installed to a different output, so I guess we could try something that compares the string to the prefix and only emits it if they difer (or rather only emits paths if they are different from the prefix).
I've got a partial fix at #154046.
Guess talking about this in detail on matrix nix Haskell channel and not brining that discussion back left out some details.
@hexagonal-sun the approach is good. You can keep all directives, just modify the mkEnvOrReloc
call to just emit the EnvVar lookup. Then patch the install phase to remit a wrapper around the binary setting the absolute path for the data dir via the env var.
I would not like that, it's a far to invasive change just to work around that GHC is missing dead tree elimination in the default backend for a single platform. This would also require to add a ton of new stuff to the generic builder and we still wouldn't eliminate the reference cycle for free.
@sternenseemann just to be clear here. Ghc doesn't do any dead code elimination by itself. It relies on the linker to do so and tries to produce code that's kinda dead-strippable. But there is no major logic in GHC to do DCE outside of split-sections for ELF and subsections_via_symbols for Mach-O.
The right fix imho is to fix cabal to not be an utterly idiotic tool with its stupid Paths_ module.
Relying on DCE to fix your broken code is not the right strategy. Fix the broken code from the get go would be.
Describe the bug
niv
fails to build onaarch64-darwin
with the following errors:Recent hydra builds also fail: https://hydra.nixos.org/job/nixpkgs/nixpkgs-unstable-aarch64-darwin/niv.aarch64-darwin
Steps To Reproduce
Steps to reproduce the behavior:
nix-shell -p niv
Expected behavior
niv
should install and be available.Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result.Maintainer information: