Closed hasufell closed 3 months ago
@hasufell, the source code (src/setup-shim/StackSetupShim.hs
) has:
{-# LANGUAGE CPP #-}
{-# LANGUAGE PackageImports #-}
module StackSetupShim where
import Main
#if defined(MIN_VERSION_Cabal)
#if MIN_VERSION_Cabal(3,8,1)
import Distribution.PackageDescription (PackageDescription, emptyHookedBuildInfo)
#else
import "Cabal" Distribution.PackageDescription (PackageDescription, emptyHookedBuildInfo)
#endif
#else
import Distribution.PackageDescription (PackageDescription, emptyHookedBuildInfo)
#endif
I am wondering if I have the logic of the #if MIN_VERSION_Cabal(3,8,1)
'the wrong way around'. The history of this is at:
I'm not sure. It seems to be spurious. Sometimes the CI succeeds, sometimes not.
Some comments for my own use:
If the 'setup-exe' has not been cached, it is built with (for example, extracts, reformatted):
...\ghc-9.6.4\bin\ghc-9.6.4.exe
-rtsopts
-threaded
-clear-package-db # Clear the current package database stack
-global-package-db # Add the global package database to the top of the current stack
-hide-all-packages # All packages need to be explicitly exposed
-package base
-main-is StackSetupShim.mainOverride
-package Cabal-3.10.1.0
C:\sr\setup-exe-src\setup-9p6GVs8J.hs
C:\sr\setup-exe-src\setup-shim-9p6GVs8J.hs
-o C:\sr\setup-exe-cache\x86_64-windows\tmp-Cabal-simple_9p6GVs8J_3.10.1.0_ghc-9.6.4.exe
the -package Cabal-3.10.1.0
being the GHC boot package for the specified GHC. I think this can't be the problem, because all the packages are hidden from GHC except for base
and Cabal
.
However, there is also code in Stack.Build.ExecuteEnv.withSingleContext
as part of withCabal
(extract):
runExe compilerPath $
[ "--make"
, "-odir", toFilePathNoTrailingSep setupDir
, "-hidir", toFilePathNoTrailingSep setupDir
, "-i", "-i."
] ++ packageArgs ++
[ toFilePath setuphs
, toFilePath ee.setupShimHs
, "-main-is"
, "StackSetupShim.mainOverride"
, "-o", toFilePath outputFile
, "-threaded"
] ++
-- Apply GHC options
-- https://github.com/commercialhaskell/stack/issues/4526
map
T.unpack
( Map.findWithDefault
[]
AGOEverything
config.ghcOptionsByCat
++ case config.applyGhcOptions of
AGOEverything -> ee.buildOptsCLI.ghcOptions
AGOTargets -> []
AGOLocals -> []
)
So, if I take the example given in https://github.com/commercialhaskell/stack/commit/79e61349b5bdba8ed0326f41d6b58b2938cc732c, one of the steps is (extracts, reformatted):
process > configure
[debug] Run process within ...\process-1.6.15.0\:
...\ghc-9.0.2\bin\ghc-9.0.2.exe
--make
-odir ...\process-1.6.15.0\.stack-work\dist\f972502b\setup
-hidir ...\process-1.6.15.0\.stack-work\dist\f972502b\setup
-i
-i.
-package=Cabal-3.4.1.0 # Expose this version of the Cabal package
-clear-package-db
-global-package-db
-package-db=C:\sr\snapshots\dfce5266\pkgdb
...\process-1.6.15.0\Setup.hs
C:\sr\setup-exe-src\setup-shim-9p6GVs8J.hs
-main-is StackSetupShim.mainOverride
-o ...\process-1.6.15.0\.stack-work\dist\f972502b\setup\setup
-threaded
-haddock
...
[info] process > [2 of 2] Compiling StackSetupShim ...
[info] process > Linking ...\process-1.6.15.0\.stack-work\dist\f972502b\setup\setup.exe ...
At the end of the build, the package database C:\sr\snapshots\dfce5266\pkgdb
contains the packages specified as extra-deps: Cabal-3.8.1.0-9FYtgyN7c9W8mwkFl4FEqj.conf
, Cabal-syntax-3.8.1.0-CC9Bfx5CE7X4XpP9arS2OL.conf
, process-1.6.15.0-DsrkIC2fIQkBJadX1qo1Db.conf
.
@hasufell, an odd thing about your error message:
2024-03-16T10:34:46.4940030Z streamly > Ambiguous module name ‘Distribution.PackageDescription’:
2024-03-16T10:34:46.5010610Z streamly > it was found in multiple packages:
2024-03-16T10:34:46.5072480Z streamly > Cabal-syntax-3.8.1.0 Cabal-syntax-3.8.1.0
is that it is referring to the same package identifier Cabal-syntax-3.8.1.0
twice.
Back when I was experiencing ambiguity, it was between different packages:
process > Ambiguous module name ‘Distribution.PackageDescription’:
process > it was found in multiple packages:
process > Cabal-3.4.1.0 Cabal-syntax-3.8.1.0
EDIT: I think that is the problem (the same two package identifiers, 'conflicting with itself') not the source code for the shim.
I think the logic of my:
#if MIN_VERSION_Cabal(3,8,1)
import Distribution.PackageDescription (PackageDescription, emptyHookedBuildInfo)
#else
import "Cabal" Distribution.PackageDescription (PackageDescription, emptyHookedBuildInfo)
#endif
was that (a) with Cabal >= 3.8.1 GHC knew how to handle the clash of module names between Cabal
and Cabal-syntax
but (b) Cabal < 3.8.1 did not - so "Cabal"
had to be specified expressly.
The GHC-supplied Cabal-3.8.1.0.conf
includes:
exposed-modules:
...
Distribution.PackageDescription from Cabal-syntax-3.8.1.0:Distribution.PackageDescription,
...
As indicated, I could not reproduce on Windows 11. The snapshot is lts-21.25
(GHC 9.4.8, which comes with Cabal-3.8.1.0
and Cabal-syntax-3.8.1.0
). Cabal-3.8.1.0
and Cabal-syntax-3.8.1.0
are also specified as extra-deps (presumably because other GHC boot packages have been overridden), which explains why they are built anyway.
I wonder if package shadowing is not occurring as it should for GHC 9.4.8 on macOS?
On Windows 11, using stack --verbose build streamly
, the call to GHC has the form above:
Run process within ...\streamly-0.8.3\:
...\ghc-9.4.8\bin\ghc-9.4.8.exe
--make
-odir ...\streamly-0.8.3\.stack-work\dist\f1a1ac53\setup
-hidir ...\streamly-0.8.3\.stack-work\dist\f1a1ac53\setup
-i
-i.
-package=Cabal-3.8.1.0
-clear-package-db
-global-package-db
-package-db=C:\sr\snapshots\612a4b31\pkgdb
...\streamly-0.8.3\Setup.hs
C:\sr\setup-exe-src\setup-shim-9p6GVs8J.hs
-main-is StackSetupShim.mainOverride
-o ...\streamly-0.8.3\.stack-work\dist\f1a1ac53\setup\setup
-threaded
-haddock
An observation:
Returning to my https://github.com/commercialhaskell/stack/commit/79e61349b5bdba8ed0326f41d6b58b2938cc732c example:
The GHC-supplied Cabal-3.8.1.0.conf
has (extract):
...
id: Cabal-3.8.1.0
...
abi: c52458c52981c7dfb414e1264fc98757
...
exposed-modules:
...
Distribution.PackageDescription from Cabal-syntax-3.8.1.0:Distribution.PackageDescription,
...
but the extra-dep Cabal-3.8.1.0-9FYtgyN7c9W8mwkFl4FEqj.conf
has (extract):
...
id: Cabal-3.8.1.0-9FYtgyN7c9W8mwkFl4FEqj
...
abi: d42177acfcf8b8f3b778fbe4e0cfa473
...
exposed-modules:
...
Distribution.PackageDescription from Cabal-syntax-3.8.1.0-CC9Bfx5CE7X4XpP9arS2OL:Distribution.PackageDescription,
Could something like that, somehow, result in GHC 9.4.8 thinking two Cabal-syntax-3.8.1.0
are in play on macOS?
The GHC 9.4.8 Users Guide has:
Package shadowing: When multiple package databases are in use it is possible, though rarely, that the same installed package id is present in more than one database. In that case, packages closer to the top of the stack will override (shadow) those below them. If the conflicting packages are found to be equivalent (by ABI hash comparison) then one of them replaces all references to the other, otherwise the overridden package and all those depending on it will be removed.
Package version selection: When selecting a package, GHC will search for packages in all available databases. If multiple versions of the same package are available the latest non-broken version will be chosen.
Version conflict resolution: If multiple instances of a package version chosen by GHC are available then GHC will choose an unspecified instance.
It is not clear to me if "the same installed package id is present" refers to (a) the id
field of the *.conf
file or (b) the package identifier in Cabal terms (eg Cabal-3.8.1.0
).
@hasufell, I have switched to macOS/AArch64, and I can't reproduce the problem locally on that platform either. I am using the master
branch version of Stack, but nothing important should have changed since Stack 2.15.3.
Another observation:
The streamly
library does not depend on Cabal
but it has build-type: Configure
.
At the point in @hasufell's CI run where the build fails at streamly > configure
, the extra-dep Cabal-syntax
has been built and registered (at 2024-03-16T10:34:18.3802200Z) but the extra-dep Cabal
has not yet been built.
In my local build which works, the extra-dep Cabal-syntax
has not yet been built when the streamly > configure
begins.
This likely difference in the contents of the snapshot package database at the relevant time may explain why the problem is intermittent.
That (the explanation for intermittent failure) seems to be it: locally, if I delete the snapshot package database and start afresh with the example, but in two steps:
stack build Cabal-syntax
stack build streamly
the build fails (with dependency network
) with:
network > configure
network > [1 of 3] Compiling Main ( /private/var/folders/zp/dkvy_dtj31x04x3hcc29wp_00000gn/T/stack-a76fc1f394248344/network-3.1.4.0/Setup.hs, /private/var/folders/zp/dkvy_dtj31x04x3hcc29wp_00000gn/T/stack-a76fc1f394248344/network-3.1.4.0/.stack-work/dist/aarch64-osx/ghc-9.4.8/setup/Main.o )
network > [2 of 3] Compiling StackSetupShim ( /Users/mpilgrem/.stack/setup-exe-src/setup-shim-6HauvNHV.hs, /private/var/folders/zp/dkvy_dtj31x04x3hcc29wp_00000gn/T/stack-a76fc1f394248344/network-3.1.4.0/.stack-work/dist/aarch64-osx/ghc-9.4.8/setup/StackSetupShim.o )
network >
network > /Users/mpilgrem/.stack/setup-exe-src/setup-shim-6HauvNHV.hs:7:1: error:
network > Ambiguous module name ‘Distribution.PackageDescription’:
network > it was found in multiple packages:
network > Cabal-syntax-3.8.1.0 Cabal-syntax-3.8.1.0
network > |
network > 7 | import Distribution.PackageDescription (PackageDescription, emptyHookedBuildInfo)
network > |
EDIT: This applies on OS other than macOS - it applies to Windows 11, for example.
I am looking at the Nothing
branch of the getPackageArgs
part of withCabal
of withSingleContext
. The code documentation states:
-- This branch is usually taken for builds, and is always taken
-- for `stack sdist`.
--
-- This approach is debatable. It adds access to the snapshot
-- package database for Cabal. There are two possible objections:
--
-- 1. This doesn't isolate the build enough; arbitrary other
-- packages available could cause the build to succeed or fail.
--
-- 2. This doesn't provide enough packages: we should also
-- include the local database when building local packages.
--
-- Currently, this branch is only taken via `stack sdist` or when
-- explicitly requested in the stack.yaml file.
What @hasufell is experiencing seems to me to be an instance of objection No. 1 in the code comment. I am wondering: for build-type: Configure
should the approach to the build (in terms of package databases) of the 'setup-exe' be as for Simple
? The only difference between them is the presence and running of a configure
shell script.
EDIT: I am going to try that solution in a pull request.
@hasufell, in summary, you have uncovered a long-standing bug in Stack that affects the build of packages with build-type: Configure
and an extra-dep Cabal-syntax
, if the extra-dep happens to be registered before (a) an extra-dep Cabal
is registered and (b) Stack compiles the setup executable.
I think I have a fix (#6526) and an ugly work-around in the interim. The work-around is to build the extra-dep Cabal
before the rest of the build:
stack build Cabal
stack build
Great investigation. I'll check your branch out later and maybe run it through CI.
@hasufell, if you could run it through CI, that would be much appreciated. I am planning on releasing a Stack 2.15.5 soon (because of https://discourse.haskell.org/t/ann-stacks-default-source-for-list-of-stackage-snapshots-not-up-to-date/9023) and it would be good if I could include the resolution of this as part of that release.
@mpilgrem maybe we do a pre-release, that would make it easier?
I've no objection to a pre-release. I may make the exposure period shorter than the usual fortnight, given the nature of the 'new URL' fix.
https://github.com/haskell/ghcup-hs/pull/1031
I'll try to trigger it a couple times to see if it fails again
I have not been able to re-trigger the failure with the pre-release after 4 runs: https://github.com/haskell/ghcup-hs/actions/runs/8378001994?pr=1031
https://github.com/haskell/ghcup-hs/actions/runs/8307024100/job/22735795982
stack version 2.15.3
To reproduce:
git clone https://github.com/haskell/ghcup-hs.git
stack build
This seems to only happen on macOS (in CI).