commercialhaskell / stack

The Haskell Tool Stack
http://haskellstack.org
BSD 3-Clause "New" or "Revised" License
3.98k stars 842 forks source link

Arch Linux: GHC's configure script fails when LDFLAGS contains pack-relative-relocs #6525

Closed hseg closed 6 months ago

hseg commented 6 months ago

General summary/comments

On a fresh system ($XDG_DATA_HOME/stack empty), a bare stack build fails to build, erroring out at Installing GHC. However, stack setup && stack build succeeds. This is counter to my understanding of stack build's intended behaviour when it fails to find an installed ghc.

Steps to reproduce

#!/bin/bash

good="$(mktemp -d)"
bad="$(mktemp -d)"

stack --stack-root "$bad"  --verbose build 2>&1 | tee stack-bad.log
stack --stack-root "$good" --verbose setup 2>&1 | tee stack-good.log
echo '>>> Stack build'                          | tee -a stack-good.log
stack --stack-root "$good" --verbose build 2>&1 | tee -a stack-good.log

Expected

stack build notices no ghc is installed, invokes stack setup, then proceeds with build with the installed ghc.

Actual

Logs attached (ignore the S-8506 error in the good log, that is due to the testing directory not containing a stack.yaml file): stack-bad.log stack-good.log

Stack version

stack --version
Version 2.15.3, Git revision cffdec6ea6cf4500e08c92fea044c48a6032759d x86_64 hpack-0.36.0

Method of installation

https://aur.archlinux.org/packages/stack-static, patched to install 2.15.3 binary release.

Platform

Arch Linux 6.7.9

hseg commented 6 months ago

(This should go without saying, but this workflow used to work as late as two weeks ago with stack-static 2.15.1. At least it's good to know the workaround of explicitly calling stack setup works.)

mpilgrem commented 6 months ago

@hseg, thanks for reporting.

On Windows 11, with Stack 2.15.3, first run of stack build outside of a project directory:

On the Ubuntu distribution of Linux (via WSL), Stack 2.15.3 does the same.

Do you get the same problem on Arch Linux with an 'official' build of Stack 2.15.3 for Linux?

hseg commented 6 months ago

On Sat, Mar 16, 2024 at 03:50:31PM -0700, Mike Pilgrem wrote:

@hseg, thanks for reporting.

On Windows 11, with Stack 2.15.3, first run of stack build outside of a project directory:

  • creates the missing config.yaml in the Stack root
  • creates the missing global-project in the Stack root (which is populated with lts-22.11 due to https://github.com/commercialhaskell/stack/issues/6516)
  • fetches the requested version of GHC
  • correctly, throws error [S-8506] (no target)

On the Ubuntu distribution of Linux (via WSL), Stack 2.15.3 does the same.

Do you get the same problem on Arch Linux with an 'official' build of Stack 2.15.3 for Linux?

The official stack build for Arch Linux is on 2.9.1 and to test it would require rebuilding 67 dependencies. Can try later this week.

Though an important note -- the bug does not occur with a fresh project, but rather with a fresh stack install -- i.e. $XDG_DATA_HOME/stack empty, so that stack build needs to fetch GHC. Though by your dscription you might be exercising the same code path, so this point might be moot.

mpilgrem commented 6 months ago

By 'official', I meant the binary distributions provided via this repository - but I would include the ones provided via GHCup too. Understood on a 'fresh install' - I think I deleted enough of my Stack root to mimic that (I deleted the entire Stack root on Ubuntu).

hseg commented 6 months ago

On Sun, Mar 17, 2024 at 04:49:50AM -0700, Mike Pilgrem wrote:

By 'official', I meant the binary distributions provided via this repository - but I would include the ones provided via GHCup too. Understood on a 'fresh install' - I think I deleted enough of my Stack root to mimic that (I deleted the entire Stack root on Ubuntu).

So testing with a ghcup-vendored stack, I reproduce your lack of errors, so presumably it's the github releases that are at fault. Indeed, they differ:

$ file ghcup/stack-2.15.3-linux-x86_64/stack 
ghcup/stack-2.15.3-linux-x86_64/stack: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=491a5805a6dd0316cc949ec155d3e4cd2b5f269d, stripped
$ file pacman/stack-2.15.3-linux-x86_64/stack 
pacman/stack-2.15.3-linux-x86_64/stack: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=27d126ba061fdc227fccd6ae92feb90715899073, stripped

(ghcup pulls https://downloads.haskell.org/~ghcup/unofficial-bindists/stack/2.15.3/stack-2.15.3-linux-x86_64.tar.gz while the PKGBUILD I installed pulls https://github.com/commercialhaskell/stack/releases/download/v2.15.3/stack-2.15.3-linux-x86_64.tar.gz )

However, for some reason I can't reproduce with my locally-packaged stack either right now. So either this was a network/infra error that's been fixed, or it's due to me testing this at uni today -- will check again when I come home later today.

mpilgrem commented 6 months ago

That pacman/stack-2.15.3-linux-x86_64/stack is indeed the same as the 'official' Stack 2.15.3:

$ file /home/mpilgrem/.local/bin/stack
/home/mpilgrem/.local/bin/stack: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=27d126ba061fdc227fccd6ae92feb90715899073, stripped
hseg commented 6 months ago

On Sun, Mar 17, 2024 at 10:48:58AM -0700, Mike Pilgrem wrote:

That pacman/stack-2.15.3-linux-x86_64/stack is indeed the same as the 'official' Stack 2.15.3:

$ file /home/mpilgrem/.local/bin/stack
/home/mpilgrem/.local/bin/stack: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=27d126ba061fdc227fccd6ae92feb90715899073, stripped

Further testing suggests this is due to a poor interaction between stack and makepkg -- could only reproduce when running stack under makepkg.

Going through the logs, suspected the {C,CXX,LD,MAKE}FLAGS that makepkg sets -- these might interfere with stack operation. Their values are:

DEBUGFLAGS="-g -ffile-prefix-map=/tmp/src=/usr/src/debug/test-stack -flto=auto"
CFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions \
        -Wp,-D_FORTIFY_SOURCE=3 -Wformat -Werror=format-security \
        -fstack-clash-protection -fcf-protection \
        -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer \
        $DEBUGFLAGS"
CXXFLAGS="$CFLAGS -Wp,-D_GLIBCXX_ASSERTIONS $DEBUGFLAGS"
LDFLAGS="-Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now \
         -Wl,-z,pack-relative-relocs -flto=auto"
MAKEFLAGS="-j4"

And indeed, (see stack-build-envvar.log), setting these variables makes stack bork. Am out of time and spoons for tonight to bisect these assignments to see which one is to blame, hope these help.

(can set these flags to different values, just need to know which ones to kick)

EDIT: Github disallows updating email replies to markdown format even after the fact, logs attached to comment below.

hseg commented 6 months ago

Github really doesn't like email responses, does it? Cleaned up previous comment, sorry for the mess in your inboxes. The following are the files I attempted to attach to the previous comment: makepkg.log stack-build-envvars.log stack-build.log

And since Github doesn't like files without filetypes:

pkgname=test-stack
pkgver=1
pkgrel=1
pkgdesc='Testing running stack in PKGBUILD'
arch=('x86_64')
url='https://github.com/commercialhaskell/stack'
license=('CC0')
makedepends=('stack')
source=()

prepare() {
    #stack setup
    :
}

build() {
    stack build --verbose
}

package() {
    :
}
mpilgrem commented 6 months ago

I am not familiar with makepkg. In the failure log, this line during GHC's configuration process seems important:

2024-03-18 00:35:55.435076: [error] configure: error: Failed to determine machine word size. Does your toolchain actually work?

It seems that something disables GHC's configuration on installation. That is, if this is an issue, it currently seems to me to be at the level of GHC rather than Stack.

hseg commented 6 months ago

On Sun, Mar 17, 2024 at 04:12:02PM -0700, Mike Pilgrem wrote:

I am not familiar with makepkg. In the failure log, this line during GHC's configuration process seems important:

2024-03-18 00:35:55.435076: [error] configure: error: Failed to determine machine word size. Does your toolchain actually work?

It seems that something disables GHC's configuration on installation. That is, if this is an issue, it currently seems to me to be at the level of GHC rather than Stack.

Hence my minimizing of the bug to being due to the {C,CXX,LD,MAKE}FLAGS environment variables -- running stack build --verbose with these environment variables set to the values I posted in https://github.com/commercialhaskell/stack/issues/6525#issuecomment-2002643812 reproduces the failure to install GHC I originally reported, without needing anything Arch-specific as far as I can tell.

(These are the default settings Arch uses to build packages it distributes, hence why they were set when running stack under makepkg)

mpilgrem commented 6 months ago

Stack installs GHC by, essentially, following programmatically the manual install instructions in the INSTALL file provided with the GHC binary distribution: https://downloads.haskell.org/~ghc/9.6.4/ghc-9.6.4-x86_64-fedora33-linux.tar.xz. Stack itself does not pay any attention to any of those environment variables.

I searched GHC's issues for makepkg but did not identify an existing issue: https://gitlab.haskell.org/search?group_id=2&scope=issues&search=makepkg.

As you understand better than me what is makepkg and how it might be affecting adversely GHC's configure script, you are probably better placed than me to raise a GHC issue.

mpilgrem commented 6 months ago

Pinging @hasufell (an expert in installing GHC on various Linux distributions) in case he can provide any insight.

hseg commented 6 months ago

OK, so I minimized the breaking situation:

export LDFLAGS='-Wl,-z,pack-relative-relocs'
stack build --verbose --stack-root "$(mktemp -d)"

Relevant Arch RFC to provide context: https://rfc.archlinux.page/0023-pack-relative-relocs/

Logs attached, though presumably Github will reproduce them below instead.

hseg commented 6 months ago

... Logs didn't attach, here they are: stack.log

mpilgrem commented 6 months ago

@hseg, that stack.log file seems to be 0 bytes in size. Can you re-supply it?

mpilgrem commented 6 months ago

@hseg, also - to rule things out - noting "supported since glibc 2.36, GNU Binutils 2.38 and LLVM 15": can you confirm your system has all of those pre-requisities?

mpilgrem commented 6 months ago

Found on the Internet - by analogy, could it be LLVM-related:

mpilgrem commented 6 months ago

Extracts from GHC's configure script:

# ac_fn_c_compute_int LINENO EXPR VAR INCLUDES
# --------------------------------------------
# Tries to find the compile-time value of EXPR in a program that includes
# INCLUDES, setting VAR accordingly. Returns whether the value could be
# computed
ac_fn_c_compute_int ()
{

...

# The cast to long int works around a bug in the HP C Compiler
# version HP92453-01 B.11.11.23709.GP, which incorrectly rejects
# declarations like `int a3[[(sizeof (unsigned char)) >= 0]];'.
# This bug is HP SR number 8606223364.
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking size of void *" >&5
$as_echo_n "checking size of void *... " >&6; }
if ${ac_cv_sizeof_void_p+:} false; then :
  $as_echo_n "(cached) " >&6
else
  if ac_fn_c_compute_int "$LINENO" "(long int) (sizeof (void *))" "ac_cv_sizeof_void_p"        "$ac_includes_default"; then :

else
  if test "$ac_cv_type_void_p" = yes; then
     { { $as_echo "$as_me:${as_lineno-$LINENO}: error: in \`$ac_pwd':" >&5
$as_echo "$as_me: error: in \`$ac_pwd':" >&2;}
as_fn_error 77 "cannot compute sizeof (void *)
See \`config.log' for more details" "$LINENO" 5; }
   else
     ac_cv_sizeof_void_p=0
   fi
fi

fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_sizeof_void_p" >&5
$as_echo "$ac_cv_sizeof_void_p" >&6; }

cat >>confdefs.h <<_ACEOF
#define SIZEOF_VOID_P $ac_cv_sizeof_void_p
_ACEOF

if test "x$ac_cv_sizeof_void_p" = "x0"; then
    as_fn_error $? "Failed to determine machine word size. Does your toolchain actually work?" "$LINENO" 5
fi
hseg commented 6 months ago

Oops. Another attempt at attaching the log stack.log As for the package versions:

$ pacman -Q binutils glibc llvm-libs 
binutils 2.42-2
glibc 2.39-1
llvm-libs 17.0.6-2

Don't have llvm itself installed, just the runtime libraries. Unsure what's going on there.

hseg commented 6 months ago

Indeed, tracing the ./configure output, it appears it's invoking GCC, so that bit isn't exotic.

hseg commented 6 months ago

Found https://gitlab.archlinux.org/archlinux/packaging/packages/pacman/-/merge_requests/6#note_171460, will report there later today, see what they have to say

hasufell commented 6 months ago

Archlinux is a clusterf*ck. It has been one of the worst distributions to use Haskell on. First they forced dynamic linking via their PKGBUILDs, causing so much trouble with cabal and other toolings. They really have no idea what they are doing.

We're already warning users about arch being broken crap: https://www.haskell.org/downloads/

Do not use the Haskell development tools provided by Arch, they are broken. For more information see [1] [2].

This just reinforces it. Don't use arch. They don't know what they are doing.


Wrt the LDFLAGs... I asked some people knowledgeable about linking and they think it's bonkers to force pack-relative-relocs.

hseg commented 6 months ago

Reported this at Arch as well, here's hoping we find a solution better than "burn it all down" https://gitlab.archlinux.org/archlinux/packaging/packages/pacman/-/merge_requests/6#note_171667

Though the brokenness of Haskell on Arch is indeed why I build all my Haskell programs statically, relying on stack for dependency resolution rather than pacman.

mpilgrem commented 6 months ago

I am going to close this issue, from Stack's perspective, as it seems to be, squarely, 'upstream'. The discussion and links here should be of help to other Arch Linux users who encounter it.

hasufell commented 6 months ago

Reported this at Arch as well, here's hoping we find a solution better than "burn it all down" https://gitlab.archlinux.org/archlinux/packaging/packages/pacman/-/merge_requests/6#note_171667

Though the brokenness of Haskell on Arch is indeed why I build all my Haskell programs statically, relying on stack for dependency resolution rather than pacman.

I still think this should also be raised as a GHC issue.

hseg commented 6 months ago

Fair enough, raised: https://gitlab.haskell.org/ghc/ghc/-/issues/24565 Another tangent that could be explored here -- why is it that stack build with these LDFLAGS fails, but stack setup && stack build succeeds?

mpilgrem commented 6 months ago

On that question, the answer seems to be that the environment was set differently in each case (which you can see in the logs in the line [debug] menv = fromList ...).

If we take the log files above in order:

The logs above do not, in fact, include a case where (a) LDFLAGS was set and (b) stack setup succeeded.

hseg commented 6 months ago

On Tue, Mar 19, 2024 at 11:18:24AM -0700, Mike Pilgrem wrote:

On that question, the answer seems to be that the environment was set differently in each case (which you can see in the logs in the line [debug] menv = fromList ...).

If we take the log files above in order:

  • stack-bad.log: LDFLAGS not set -> GHC configures
  • stack-good.log: LDFLAGS not set -> GHC configures
  • makepkg.log: LDFLAGS set -> GHC does not configure
  • stack-build-envvars.og: LDFLAGS set -> GHC does not configure
  • stack-build.log: LDFLAGS not set -> GHC configures
  • stack.log: LDFLAGS set -> GHC does not configure

The logs above do not, in fact, include a case where (a) LDFLAGS was set and (b) stack setup succeeded.

Hrm. Indeed, testing

export LDFLAGS='-Wl,-z,pack-relative-relocs'
stack setup --stack-root "$(mktemp -d)" --verbose

I indeed obtain the Failed to determine machine word size error.

A little investigating later, it appears that what I was noticing was intended makepkg behaviour -- it sets LDFLAGS as non-exported in prepare() (which is where I was running stack setup), but as exported in build() (which is where I was running stack build.

And indeed, moving the stack setup to build() reproduces the error. So our attribution of the error to the LDFLAGS setting is correct, and makepkg just muddied the waters.

hseg commented 6 months ago

In reraising this for Arch, found the root cause -- ghc and other packages were building with LD=ld.gold, which doesn't support these LDFLAGS. GHC is now testing $LD $LDFLAGS works before settling on that choice of LD. Might this be an idea for stack to implement as a sanity check? Suggested this for cabal as well: https://github.com/haskell/cabal/issues/9828

hasufell commented 6 months ago

Yes, the choice of gold was an odd one. GHCup is already forcing ld.bfd on alpine, because gold is causing problems.

mpilgrem commented 5 months ago

GHC 9.6.5, now released, includes "Ensuring we take LDFLAGS into account when configuring a linker (#24565)."