NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.56k stars 13.72k forks source link

GHC 9.8 TemplateHaskell doesn't work in pkgsStatic #275304

Open bgamari opened 9 months ago

bgamari commented 9 months ago

Describe the bug

Haskell packages in nixpkgs.pkgsStatic.haskell.packages.ghc98 are unable to be built.

Steps To Reproduce

Steps to reproduce the behavior:

  1. nix build nixpkgs#legacyPackages.x86_64-linux.pkgsStatic.haskell.packages.ghc98.Diff

Expected behavior

Diff is built, linking against musl.

Observed behavior

nix-repl> :b legacyPackages.x86_64-linux.pkgsStatic.haskell.packages.ghc98.Diff                                                                                                                                               error: build of '/nix/store/ickzn6az4asiwrh1wq20blm0dvcr5w17-Diff-static-x86_64-unknown-linux-musl-0.4.1.drv' on 'ssh://ben@maurer.local' failed: builder for '/nix/store/ickzn6az4asiwrh1wq20blm0dvcr5w17-Diff-static-x86_64-unknown-linux-musl-0.4.1.drv' failed with exit code 1;
       last 10 log lines:
       >    | ^^^^^^^^^^^^^^^^^^^^^^^^^^
       >
       > src/Data/Algorithm/Diff.hs:29:1: error: [GHC-47808]
       >     Failed to load dynamic interface file for Data.Array:
       >       Exception when reading interface file  /nix/store/zb7g1q1vza1x0fmb8qk8cv9y23b9w81g-x86_64-unknown-linux-musl-ghc-9.8.1/lib/x86_64-unknown-linux-musl-ghc-9.8.1/lib/../lib/x86_64-linux-ghc-9.8.1/array-0.5.6.0-inplace/Data/Array.dyn_hi
       >         /nix/store/zb7g1q1vza1x0fmb8qk8cv9y23b9w81g-x86_64-unknown-linux-musl-ghc-9.8.1/lib/x86_64-unknown-linux-musl-ghc-9.8.1/lib/../lib/x86_64-linux-ghc-9.8.1/array-0.5.6.0-inplace/Data/Array.dyn_hi: withBinaryFile: does not exist (No such file or directory)
       >    |
       > 29 | import Data.Array (listArray, (!))
       >    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       > load' failed
       For full logs, run 'nix log /nix/store/ickzn6az4asiwrh1wq20blm0dvcr5w17-Diff-static-x86_64-unknown-linux-musl-0.4.1.drv'.
error: builder for '/nix/store/ickzn6az4asiwrh1wq20blm0dvcr5w17-Diff-static-x86_64-unknown-linux-musl-0.4.1.drv' failed with exit code 1;
       last 10 log lines:
       >    | ^^^^^^^^^^^^^^^^^^^^^^^^^^
       >
       > src/Data/Algorithm/Diff.hs:29:1: error: [GHC-47808]
       >     Failed to load dynamic interface file for Data.Array:
       >       Exception when reading interface file  /nix/store/zb7g1q1vza1x0fmb8qk8cv9y23b9w81g-x86_64-unknown-linux-musl-ghc-9.8.1/lib/x86_64-unknown-linux-musl-ghc-9.8.1/lib/../lib/x86_64-linux-ghc-9.8.1/array-0.5.6.0-inplace/Data/Array.dyn_hi
       >         /nix/store/zb7g1q1vza1x0fmb8qk8cv9y23b9w81g-x86_64-unknown-linux-musl-ghc-9.8.1/lib/x86_64-unknown-linux-musl-ghc-9.8.1/lib/../lib/x86_64-linux-ghc-9.8.1/array-0.5.6.0-inplace/Data/Array.dyn_hi: withBinaryFile: does not exist (No such file or directory)
       >    |
       > 29 | import Data.Array (listArray, (!))
       >    | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       > load' failed
       For full logs, run 'nix log /nix/store/ickzn6az4asiwrh1wq20blm0dvcr5w17-Diff-static-x86_64-unknown-linux-musl-0.4.1.drv'.

Additional context

The problem here appears to manifest during building of Haddock documentation. For instance,

$ nix repl
nix-repl> :lf nixpkgs#
nix-repl> :b legacyPackages.x86_64-linux.haskell.lib.dontHaddock (legacyPackages.x86_64-linux.pkgsStatic.haskell.packages.ghc98.Diff)

This derivation produced the following outputs:
  out -> /nix/store/5096rq1664b9qq2f2g9ih0sv6is48d8z-Diff-static-x86_64-unknown-linux-musl-0.4.1

Notify maintainers

@nh2

bgamari commented 9 months ago

My suspicion here is that this is either a Cabal or Haddock bug, although I'm not yet sure which.

angerman commented 9 months ago

Of note: something assumes the existence of dynamic files, while there are none. Not quite sure why GHC would try to load dynamic files. Trying to read

/nix/store/s13v3xsi60z627ic821fm70mlw43a3za-x86_64-unknown-linux-musl-ghc-9.8.1/lib/x86_64-unknown-linux-musl-ghc-9.8.1/lib/../lib/x86_64-linux-ghc-9.8.1/array-0.5.6.0-inplace/Data/Array.dyn_hi

however

/nix/store/s13v3xsi60z627ic821fm70mlw43a3za-x86_64-unknown-linux-musl-ghc-9.8.1/lib/x86_64-unknown-linux-musl-ghc-9.8.1/lib/../lib/x86_64-linux-ghc-9.8.1/array-0.5.6.0-inplace/Data
total 27K
dr-xr-xr-x 3 root root    5 Jan  1  1970 .
dr-xr-xr-x 3 root root    7 Jan  1  1970 ..
dr-xr-xr-x 6 root root   22 Jan  1  1970 Array
-r--r--r-- 2 root root 2.9K Jan  1  1970 Array.hi
-r--r--r-- 2 root root 2.9K Jan  1  1970 Array.p_hi

Maybe someone has an idea where the dyn_hi load comes from.

rnhmjoj commented 9 months ago

ping: @NixOS/static

sternenseemann commented 9 months ago

Good to know that the hadrian regression from #208959 has been fixed, so we can at least build GHC now.

sternenseemann commented 9 months ago

My diagnosis is the following:

I can fix that by just disabling haddock in the same way as we do for GHC < 9.6. I'll try doing that later.

@bgamari @angerman The question is of course, and you can answer that better than me, has anything changed w.r.t. hadddock and cross with Hadrian?

bgamari commented 9 months ago

Thanks @sternenseemann! Your hypothesis does sound plausible.

Recently we did rework Haddock to take documentation from Haskell interface (.hi) files. I can't help but wonder whether this logic may be culpable: https://gitlab.haskell.org/ghc/haddock/-/blob/b0b0e0366457c9aefebcc94df74e5de4d00e17b7/haddock-api/src/Haddock.hs#L170. This was apparently introduced due to https://github.com/haskell/haddock/issues/256.

sternenseemann commented 9 months ago

Seems plausible. I'm personally not too fussed that this change means that haddock is not “retargetable”, i.e. you always need to use the precise haddock bundled with the GHC you are using to compile the documented code. In fact, we probably should explicitly tell Cabal which haddock to use, so this kind of issue doesn't happen or is easier to diagnose.

I'll need to investigate, though, under which circumstances we can build haddock with hadrian now.

sternenseemann commented 9 months ago

The problems seems to be that the haddock package is only built using the stage1 compiler (so as part of stage2) which we necessarily never reach in the case of cross compilation. Presumably we can work around this in UserSettings somehow (although IME you are quite limited if your solution is to be maintainable), but I feel like this is a genuine gap and there ought to be a better way to build a cross-compiler with hadrian…

angerman commented 9 months ago

I've just skimmed the code, but why do we do this:

  -- Inject dynamic-too into ghc options if the ghc we are using was built with
  -- dynamic linking
  flags'' <- ghc flags $ do
        df <- getDynFlags
        case lookup "GHC Dynamic" (compilerInfo df) of
          Just "YES" -> return $ Flag_OptGhc "-dynamic-too" : flags
          _ -> return flags

what's the rational for adding -dynamic-too here? I can somewhat extract the rational from https://github.com/haskell/haddock/issues/256, but the comment above this is rather poor. Also it does not provide any way to pass to haddock to prevent this automagic.

I guess the proper thing here is to just disable haddocks for cross, and rely on native compilers haddocks.

sternenseemann commented 9 months ago

I guess the proper thing here is to just disable haddocks for cross, and rely on native compilers haddocks.

Do you mean the native compiler's haddock executable or re-using the documentation built natively? The former currently happens (unintentionally) and seems to be the source of the problem…

angerman commented 9 months ago

@sternenseemann

re-using the documentation built natively

This :D

domenkozar commented 8 months ago

It's still broken: https://github.com/domenkozar/nixpkgs-static-repo

domenkozar commented 8 months ago

Even easier reproducer:

nix-build -A pkgsStatic.haskell.packages.ghc98.th-orphans

error: builder for '/nix/store/dibiy3qjbg2l5ahlqf28axfqz5xw91xn-th-orphans-static-x86_64-unknown-linux-musl-0.13.14.drv' failed with exit code 1;
       last 10 log lines:
       > /nix/store/20rsi77ny2i4i1rbd63h4392a245j5dz-gnutar-1.35/bin/tar
       > No uhc found
       > Running phase: buildPhase
       > Preprocessing library for th-orphans-0.13.14..
       > Building library for th-orphans-0.13.14..
       > [1 of 2] Compiling Language.Haskell.TH.Instances.Internal ( src/Language/Haskell/TH/Instances/Internal.hs, dist/build/Language/Haskell/TH/Instances/Internal.o )
       > [2 of 2] Compiling Language.Haskell.TH.Instances ( src/Language/Haskell/TH/Instances.hs, dist/build/Language/Haskell/TH/Instances.o )
       >
       > <no location info>: error:
       >     Couldn't find a target code interpreter. Try with -fexternal-interpreter
       For full logs, run 'nix log /nix/store/dibiy3qjbg2l5ahlqf28axfqz5xw91xn-th-orphans-static-x86_64-unknown-linux-musl-0.13.14.drv'.
angerman commented 8 months ago

Even easier reproducer:

nix-build -A pkgsStatic.haskell.packages.ghc98.th-orphans

error: builder for '/nix/store/dibiy3qjbg2l5ahlqf28axfqz5xw91xn-th-orphans-static-x86_64-unknown-linux-musl-0.13.14.drv' failed with exit code 1;
       last 10 log lines:
       > /nix/store/20rsi77ny2i4i1rbd63h4392a245j5dz-gnutar-1.35/bin/tar
       > No uhc found
       > Running phase: buildPhase
       > Preprocessing library for th-orphans-0.13.14..
       > Building library for th-orphans-0.13.14..
       > [1 of 2] Compiling Language.Haskell.TH.Instances.Internal ( src/Language/Haskell/TH/Instances/Internal.hs, dist/build/Language/Haskell/TH/Instances/Internal.o )
       > [2 of 2] Compiling Language.Haskell.TH.Instances ( src/Language/Haskell/TH/Instances.hs, dist/build/Language/Haskell/TH/Instances.o )
       >
       > <no location info>: error:
       >     Couldn't find a target code interpreter. Try with -fexternal-interpreter
       For full logs, run 'nix log /nix/store/dibiy3qjbg2l5ahlqf28axfqz5xw91xn-th-orphans-static-x86_64-unknown-linux-musl-0.13.14.drv'.

That suggests that the GHC was not built as stage2 compiler, or some of the new cross target logic prohibits native codegen now as well.

sternenseemann commented 8 months ago

Yes, we are only building stage 1 here. As it turns out, for GHC < 9.6 we used to build the stage 2 compiler in this case, so seems like a detail I missed when porting the expression to hadrian.

283773

sternenseemann commented 8 months ago

Unfortunately, also building Stage 2 doesn't fix the problem according to my testing, maybe @domenkozar can confirm on #283773.

domenkozar commented 8 months ago

new cross target logic prohibits native codegen now as well.

Does now refer to hadrian, ghc, or nixpkgs?

wolfgangwalther commented 8 months ago

Unfortunately, also building Stage 2 doesn't fix the problem according to my testing, maybe @domenkozar can confirm on https://github.com/NixOS/nixpkgs/pull/283773.

I looked into this a little bit and the problem seems that hadrian-based builds don't build ghc-iserv anymore, which leads to Couldn't find a target code interpreter. Try with -fexternal-interpreter. GHC 9.4 without hadrian was still building it and thus succeeds.

I think the logic in hadrian is kind of the same as before 853c1214855e07fdb44655868532b3b6245865d4 - the full platform is compared and not something like "can execute".

angerman commented 8 months ago

Just sidestep the whole braindead install logic from hadrian. It's so bad...

Just build and install the compiler with cp.

angerman commented 8 months ago

The haskell.nix builder for GHC work around this by sidestepping hadrians build and install process and doing it a bit more explicit.

https://github.com/input-output-hk/haskell.nix/blob/6eaafcdf04bab7be745d1aa4f74d2cc85700042b/compiler/ghc/default.nix#L787

I'm not even sure Hadrian can (or should be fixed). The proper solution seems to just bin it outright and build GHC with cabal only.

domenkozar commented 8 months ago

I can confirm it works with haskell.nix: https://github.com/domenkozar/nixpkgs-static-repo/tree/haskell.nix

wolfgangwalther commented 7 months ago

I looked into this a little bit and the problem seems that hadrian-based builds don't build ghc-iserv anymore, which leads to Couldn't find a target code interpreter. Try with -fexternal-interpreter. GHC 9.4 without hadrian was still building it and thus succeeds.

As pointed out by @sternenseemann in https://github.com/NixOS/nixpkgs/pull/287794#issuecomment-1937085851, the missing ghc-iserv is not exactly the reason for this error message, but as I mentioned in https://github.com/NixOS/nixpkgs/pull/287794#issuecomment-1937089606 probably closely related:

I think iserv can't be built, because it needs a GHCi built with -internal-interpreter, which is not built via hadrian

If we build with -finternal-interpreter, then maybe the "Couldn't find a target code interpreter." would be solved. I am referring to this part in the hadrian source:

https://gitlab.haskell.org/ghc/ghc/-/blob/master/hadrian/src/Settings/Packages.hs#L127-156

A workaround mentioned there, could be to build the static/cross compiler with the same version of GHC as bootstrap:

          -- The workaround we use is to check if the bootstrap compiler has
          -- the same version as the one we are building. In this case we can
          -- avoid the first step above and directly build with
          -- `-finternal-interpreter`.

FTR, I tried that. To do so, I had to patch the hadrian source to allow a newer Cabal first. I then changed the bootPkgs to use ghc981 when building cross:

      bootPkgs =
        if stdenv.hostPlatform != stdenv.targetPlatform then
          buildPackages.haskell.packages.ghc981
        else
          packages.ghc947;

The build then fails with a lot of this:

ghc/Main.hs:18:1: error: [GHC-53693]
    Something is amiss; requested module  ghc-9.8.1:GHC differs from name found in the interface file ghc:GHC (if these names look the same, try again with -dppr-debug)

I have not idea what that means and haven't gone further, yet. Just putting this here in case somebody has an idea.

angerman commented 7 months ago

You can explicitly build iserv using hadrian. It's not a default target for some reason. And then sidestep the broken install phase of hadrian. You don't need any of this anyway with nix as you a priori know all your install locations. So you can replace the convoluted install phase with a simple cp.

EDIT: just use the same logic we have in Haskell.nix, it should be translatable to the nixpkgs GHC builder: https://github.com/NixOS/nixpkgs/issues/275304#issuecomment-1915762455, both builders are still fairly similar.

wolfgangwalther commented 4 months ago

Just confirmed this is still a problem with GHC 9.10.1.

sternenseemann commented 4 months ago

The build then fails with a lot of this:

This is a hadrian bug, there's apparently a patch on GHC master (9.10?). That being said, self bootstrapping isn't exactly tested upstream and has become annoying with hadrian due to the strict bounds.

wolfgangwalther commented 4 months ago

Just confirmed this is still a problem with GHC 9.10.1.

To be precise: I tested pkgsStatic.haskell.packages.ghc9101.th-orphans still fails with the above external interpreter error message.

I did not test, at least yet, the self-bootstrapping approach I tried earlier with GHC 9.8. That might still be worth a try.

NorfairKing commented 3 months ago

Good to know that the hadrian regression from #208959 has been fixed, so we can at least build GHC now.

Where/when was it fixed? I'd like to patch the ghc I'm using so I'd need to know which commit fixed this.

NorfairKing commented 3 days ago

EDIT: Mistake, ignore.

wolfgangwalther commented 3 days ago

With #208959 fixed, I now get the same external-interpreter error when building th-orphans for both GHC 9.6 and 9.8. This makes sense, because it's because of the hadrian build, which both use. So this issue really applies to both now.

Edit: And as mentioned above for GHC 9.10 as well. So basically for GHC 9.6 up.

wolfgangwalther commented 3 days ago

Just confirmed this is still a problem with GHC 9.10.1.

To be precise: I tested pkgsStatic.haskell.packages.ghc9101.th-orphans still fails with the above external interpreter error message.

I did not test, at least yet, the self-bootstrapping approach I tried earlier with GHC 9.8. That might still be worth a try.

I was able to successfully bootstrap GHC 9.10.1 for pkgsStatic from GHC 9.10.1 itself this time.

It still doesn't solve the problem at hand, though:

<no location info>: error:
    Couldn't find a target code interpreter. Try with -fexternal-interpreter

All still the same.