commercialhaskell / stack

The Haskell Tool Stack
http://haskellstack.org
BSD 3-Clause "New" or "Revised" License
4k stars 843 forks source link

Irreproducibility: Code rebuilds only the first time I switch resolvers #5381

Closed nh2 closed 5 months ago

nh2 commented 4 years ago

When I switch between lts-14.27 and lts-13.26 repeatedly, stack rebuilds my code only on the first such switch.

This is illegitmate because my executable depends on (at least) conduit-extra, and that one's version switches between those LTSs.

Background / how I found this

I'm debugging an issue with my library:

The project for reproing (on Ubuntu 18.04): https://github.com/nh2/lz4-frame-conduit/tree/4616ec594426de2defda6432eec9cff04692bf7c

Command to run:

stack test && while (.stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test); do sleep 0.01; done

When I test after stack clean, my bug (a segfault or GHC panic) occurs with

resolver: lts-13.26

and not with

resolver: lts-14.27

The actual Stack issue

But when I switch between the 2 and run stack test without previous stack clean, most of the time stack doesn't rebuild the code, so then I cannot observe the difference when switching resolvers.

I get this strange behaviour:

stack clean
stack test # builds from scratch
stack test # naturally builds nothing

# I switch resolvers, lts-13.26 -> lts-14.27

stack test
# Rebuilds code; outputs:
#     lz4-frame-conduit-0.1.0.0: unregistering (local file changes: README.md app/Main.hs lz4-frame-conduit.cabal lz4/lib/lz4.c lz4/lib/lz4.h lz4/lib/lz4frame.c lz4/...)
# Why is that? None of these files changed.

# I switch resolvers back, lts-14.27 -> lts-13.26

stack test # Does not rebuild! No further resolver switching rebuilds.

So it rebuilds only the first time I switch resolvers (but for apparently the wrong reason), and then never again.

nh2 commented 4 years ago

stack test -v output leading to the local file changes: README.md ...:

[debug] Run process: /sbin/ldconfig -p
[debug] Process finished in 2ms: /sbin/ldconfig -p
[debug] Found shared library libtinfo.so.5 in 'ldconfig -p' output
[debug] Did not find shared library libtinfo.so.6
[debug] Did not find shared library libncursesw.so.6
[debug] Found shared library libgmp.so.10 in 'ldconfig -p' output
[debug] Did not find shared library libgmp.so.3
[debug] Potential GHC builds: standard
[debug] Found already installed GHC builds: standard
[debug] SELECT "id","actual_version","arch","ghc_path","ghc_size","ghc_modified","ghc_pkg_path","runghc_path","haddock_path","cabal_version","global_db","global_db_cache_size","global_db_cache_modified","info","global_dump" FROM "compiler_cache" WHERE "ghc_path"=?; [PersistText "/raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-8.6.5"]
[debug] Loaded compiler information from cache
[debug] Asking for a supported GHC version
[debug] Resolving package entries
[debug] Parsing the targets
[debug] Checking flags
[debug] SourceMap constructed
[debug] Starting to execute command inside EnvConfig
[debug] Finding out which packages are already installed
[debug] Run process: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --global --no-user-package-db dump --expand-pkgroot
[debug] Process finished in 27ms: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --global --no-user-package-db dump --expand-pkgroot
[debug] Ignoring package Cabal due to wanting version 2.4.1.0 instead of 2.4.0.1
[debug] Ignoring package haskeline due to wanting version 0.7.5.0 instead of 0.7.4.3
[debug] Run process: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --user --no-user-package-db --package-db /raid/stack/snapshots/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/pkgdb dump --expand-pkgroot
[debug] Process finished in 81ms: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --user --no-user-package-db --package-db /raid/stack/snapshots/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/pkgdb dump --expand-pkgroot
[debug] Run process: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --user --no-user-package-db --package-db /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/pkgdb dump --expand-pkgroot
[debug] Process finished in 19ms: /raid/stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg-8.6.5 --user --no-user-package-db --package-db /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/pkgdb dump --expand-pkgroot
[debug] Constructing the build plan
[debug] SELECT "id","directory","type","pkg_src","active","path_env_var","haddock" FROM "config_cache" WHERE "directory"=? AND "type"=?; [PersistText "/home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/",PersistText "lib:lz4-frame-conduit-0.1.0.0-8RAA6BcdNm21R7aXlL5jl8"]
[debug] SELECT "id", "config_cache_id", "index", "option" FROM "config_cache_dir_option" WHERE ("config_cache_id"=?) ORDER BY "index"; [PersistInt64 2035]
[debug] SELECT "id", "config_cache_id", "index", "option" FROM "config_cache_no_dir_option" WHERE ("config_cache_id"=?) ORDER BY "index"; [PersistInt64 2035]
[debug] SELECT "id", "config_cache_id", "ghc_pkg_id" FROM "config_cache_dep" WHERE ("config_cache_id"=?); [PersistInt64 2035]
[debug] SELECT "id", "config_cache_id", "component" FROM "config_cache_component" WHERE ("config_cache_id"=?); [PersistInt64 2035]
[debug] Start: getPackageFiles /home/niklas/src/haskell/lz4-frame-conduit/lz4-frame-conduit.cabal
[debug] Finished in 2ms: getPackageFiles /home/niklas/src/haskell/lz4-frame-conduit/lz4-frame-conduit.cabal
[debug] Checking if we are going to build multiple executables with the same name
[debug] Executing the build plan
[info] lz4-frame-conduit-0.1.0.0: unregistering (local file changes: README.md app/Main.hs lz4-frame-conduit.cabal lz4/lib/lz4.c lz4/lib/lz4.h lz4/lib/lz4frame.c lz4/...)
ardamose123 commented 4 years ago

I'll try to repro, but as a first hunch, I guess it's a build cache.

If you look at the .stack-work folder in your project after building it (e.g. by using stack test), there are a lot of files that result from past compilations, sorted by Cabal version in the dist folder and GHC version in the install folder.

Running stack clean wipes out the dist folder. The install folder remains with its wacky hashes.

This is the output I get from building one of my pet projects:

Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/flattendir-test/flattendir-test ...
Preprocessing executable 'flattendir-exe' for flattendir-0.1.0.0..
Building executable 'flattendir-exe' for flattendir-0.1.0.0..
[1 of 2] Compiling Main
[2 of 2] Compiling Paths_flattendir
Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/flattendir-exe/flattendir-exe ...
flattendir> copy/register
Installing library in [...]/flattendir/.stack-work/install/x86_64-linux/9570ddfa5c53856cf854aff6bc38a117a2b2ccd9c89d93289e368b54d0c78fa9/8.6.5/lib/x86_64-linux-ghc-8.6.5/flattendir-0.1.0.0-HGHxHxXKkpZLfvbfb4LpUM
Installing executable flattendir-exe in [...]/flattendir/.stack-work/install/x86_64-linux/9570ddfa5c53856cf854aff6bc38a117a2b2ccd9c89d93289e368b54d0c78fa9/8.6.5/bin
Registering library for flattendir-0.1.0.0..

The last lines indicate the compilation results are kept in the install folder, followed by the arch, followed by a hash of some sort, followed by the GHC version, followed by other stuff. This way, it may keep build results generated through both resolvers. Apparently, stack is reusing the cached stuff instead of building anew, which is desirable.

Now, the issue with your tests is interesting. It should not cache a failed build... maybe? Assuming that's the issue. I'll try to check what's going on.

qrilka commented 4 years ago

The thing that the project doesn't get rebuilt is explained by the fact that 2 LTSes give different implicit snapshots and no config change make it possible for them to coexist even for project packages. But probably it's not something intuitive as e.g. switch between a normal build and a test build invalidates project packages. The one thing which clearly needs an investigation is this thing:

#     lz4-frame-conduit-0.1.0.0: unregistering (local file changes: README.md app/Main.hs lz4-frame-conduit.cabal lz4/lib/lz4.c lz4/lib/lz4.h lz4/lib/lz4frame.c lz4/...)
# Why is that? None of these files changed.

I started looking into that but had no time yet.

ardamose123 commented 4 years ago

I cloned the repo. Noticed it already had lts-13.26 as resolver.

After running stack test and switching back and forth between the resolvers, I noticed no difference between the test outputs. I couldn't observe the GHC panic/segfault. I didn't see the unregistering thing in any of the runs. The tests run with both resolvers with pretty much the same output.

The project relies on C stuff and a *.cabal file instead of a package.yaml.

I'm using Ubuntu 18.04.4, stack 2.3.3, and gcc 7.5.0 if that's of some use.

Minor finding: when switching resolvers, the test compresses 1MB ByteString fails with a hClose: resource vanished (Broken pipe) message, but when I ran the test again without switching resolvers, the test fails with an expectation mismatch between a large string of * and an empty string.

nh2 commented 4 years ago

@ardamose123 Thanks for letting us know! That's interesting that it didn't unregister for you.

(On the point of the project's crash, which is unrelated of the stack issue: Yes, resource vanished is one of the ways that the underlying memory corruption can manifest -- I learned by now that it doesn't necessarily always segfault, see https://gitlab.haskell.org/ghc/ghc/-/issues/18680 for details on that.)

ardamose123 commented 4 years ago

Correction: I just saw the unregistering line by running stack test -v. I was not running tests with the -v flag before. Only happens when switching resolvers. This line doesn't appear if you run the tests again without changing resolvers.

However, as I mentioned before, I can build the project alright and run the tests while switching between resolvers, so it might not be related to the original issue.

@nh2, can you please describe your development environment w/ versions?

nh2 commented 4 years ago

I just saw the unregistering line by running stack test -v

@ardamose123 Do you also see local file changes: README.md app/Main.hs lz4-frame-conduit.cabal lz4/lib/lz4.c lz4/lib/lz4.h lz4/lib/lz4frame.c lz4/... in that case?

Also, when you switch resolvers, does it unregister each time, or only the first time (as it does for me)?

(For the purpose of this bug, I don't worry much about the test output, as that's either a bug in my library's code or a GHC bug. I do worry about stack's rebuilding/unregistering behaviour though.)

ardamose123 commented 4 years ago

Only first time, just after switching resolvers.

nh2 commented 4 years ago

Only first time, just after switching resolvers.

just for clarity, I mean when you switch resolvers forward and backward repeatedly.

ardamose123 commented 4 years ago

Just ran the tests a few more times. I only noticed the unregister after using stack clean:

  1. Ran stack clean
  2. Use resolver: lts-13.26
  3. Ran stack test -v » unregister appears.
  4. Ran stack test -v » logs are normal.
  5. Changed to resolver: lts-14.27
  6. Ran stack test -v » unregister appears.
  7. Ran stack test -v » logs are normal.
  8. Changed to resolver: lts-13.26
  9. Ran stack test -v » logs are normal.
  10. Ran stack test -v » logs are normal.
  11. Changed to resolver: lts-14.27
  12. Ran stack test -v » logs are normal.
  13. Ran stack test -v » logs are normal.
  14. Ran stack clean
  15. Ran stack test -v » unregister appears.

I thought I was getting the unregistering on every resolver switch, but I probably did a a stack clean in between my tests earlier.

This behavior LGTM, but I don't know if @nh2 is getting some other behavior.

qrilka commented 4 years ago

@nh2 I think I see the reason behind local file changes: README.md app/Main.hs... - after stack clean there is no build cache yet and thus Stack reports every package file as dirty, I think we just need to handle this case better and produce some more meaningful message. With that fixed I think there is nothing more to explore in this ticket. Let me know if you see something else unresolved.

nh2 commented 4 years ago

@qrilka The main problem was this:

# I switch resolvers back, lts-14.27 -> lts-13.26

stack test # Does not rebuild! No further resolver switching rebuilds.

where the expectation was that it should rebuild because the dependent libraries change.

I don't think we've found a solution/explanation for that yet (or did I miss it?).

@ardamose123 When you say "logs are normal", does it rebuild + relink the executable after switching resolvers?

qrilka commented 4 years ago

@nh2 see my answer in https://github.com/commercialhaskell/stack/issues/5381#issuecomment-692904536 - if you don't do stack clean you'll have 2 versions of your package built for 2 different implicit packages (one for each LTS).

nh2 commented 4 years ago

@qrilka Ah sorry, I missed that. I don't quite follow it though. I also don't quite understand @ardamose123's preceding comment about

results are kept in the install folder

because I'm doing nothing with install/, I'm using the contents of the build/ directory.

I still do observe that switching resolvers leaves outdated contents in the build directory:

Click to expand full log output ``` % git checkout 4616ec594426de2defda6432eec9cff04692bf7c % stack test lz4-frame-conduit-0.1.0.0: unregistering (local file changes: README.md app/Main.hs lz4-frame-conduit.cabal lz4/lib/lz4.c lz4/lib/lz4.h lz4/lib/lz4frame.c lz4/...) Building all executables for `lz4-frame-conduit' once. After a successful build of all of them, only specified executables will be rebuilt. lz4-frame-conduit> configure (lib + exe + test) Configuring lz4-frame-conduit-0.1.0.0... lz4-frame-conduit> build (lib + exe + test) Preprocessing library for lz4-frame-conduit-0.1.0.0.. Building library for lz4-frame-conduit-0.1.0.0.. [1 of 2] Compiling Codec.Compression.LZ4.CTypes [2 of 2] Compiling Codec.Compression.LZ4.Conduit Preprocessing executable 'haskell-lz4c' for lz4-frame-conduit-0.1.0.0.. Building executable 'haskell-lz4c' for lz4-frame-conduit-0.1.0.0.. [1 of 1] Compiling Main Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/haskell-lz4c/haskell-lz4c ... Preprocessing test suite 'lz4-frame-conduit-test' for lz4-frame-conduit-0.1.0.0.. Building test suite 'lz4-frame-conduit-test' for lz4-frame-conduit-0.1.0.0.. [1 of 1] Compiling Main Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test ... lz4-frame-conduit> copy/register Installing library in /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/lib/x86_64-linux-ghc-8.6.5/lz4-frame-conduit-0.1.0.0-8RAA6BcdNm21R7aXlL5jl8 Installing executable haskell-lz4c in /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/37319c6575a7d854bf30c6a8fcfdf1546086e900241daeda1f669d54761755ae/8.6.5/bin Registering library for lz4-frame-conduit-0.1.0.0.. lz4-frame-conduit> test (suite: lz4-frame-conduit-test) [...] Finished in 0.0168 seconds 2 examples, 0 failures lz4-frame-conduit> Test suite lz4-frame-conduit-test passed Completed 2 action(s). stack test 13.83s user 1.23s system 49% cpu 30.128 total % sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test 9d23a8dd39ccfe86c164d52c03d59f103a904eec .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test % # switch to lts-13.27 % stack test lz4-frame-conduit-0.1.0.0: unregistering (old configure information not found) lz4-frame-conduit> configure (lib + exe + test) Configuring lz4-frame-conduit-0.1.0.0... lz4-frame-conduit> build (lib + exe + test) Preprocessing library for lz4-frame-conduit-0.1.0.0.. Building library for lz4-frame-conduit-0.1.0.0.. [1 of 2] Compiling Codec.Compression.LZ4.CTypes [2 of 2] Compiling Codec.Compression.LZ4.Conduit Preprocessing test suite 'lz4-frame-conduit-test' for lz4-frame-conduit-0.1.0.0.. Building test suite 'lz4-frame-conduit-test' for lz4-frame-conduit-0.1.0.0.. [1 of 1] Compiling Main [Data.Conduit.Process changed] Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test ... Preprocessing executable 'haskell-lz4c' for lz4-frame-conduit-0.1.0.0.. Building executable 'haskell-lz4c' for lz4-frame-conduit-0.1.0.0.. [1 of 1] Compiling Main [Data.Conduit.Binary changed] Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/haskell-lz4c/haskell-lz4c ... lz4-frame-conduit> copy/register Installing library in /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/7d6d968be153d4dcc0edab19f77cd81934e5dbcbcb69dad0082c318db701ea3b/8.6.5/lib/x86_64-linux-ghc-8.6.5/lz4-frame-conduit-0.1.0.0-3OL8VoGxo9YCAWPxV870O2 Installing executable haskell-lz4c in /home/niklas/src/haskell/lz4-frame-conduit/.stack-work/install/x86_64-linux/7d6d968be153d4dcc0edab19f77cd81934e5dbcbcb69dad0082c318db701ea3b/8.6.5/bin Registering library for lz4-frame-conduit-0.1.0.0.. lz4-frame-conduit> test (suite: lz4-frame-conduit-test) [...] Finished in 0.0483 seconds 2 examples, 0 failures lz4-frame-conduit> Test suite lz4-frame-conduit-test passed Completed 2 action(s). stack test 7.79s user 0.91s system 97% cpu 8.894 total % sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test c0d644d1d1f9d90e241ba5191bbbf2ca15c3ce67 .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test % # switch to lts-13.26 % stack test lz4-frame-conduit> test (suite: lz4-frame-conduit-test) [...] Finished in 0.0543 seconds 2 examples, 0 failures lz4-frame-conduit> Test suite lz4-frame-conduit-test passed % sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test c0d644d1d1f9d90e241ba5191bbbf2ca15c3ce67 .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test ```

Summarised output of the above:

% stack test
Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test ...
2 examples, 0 failures

% sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test
9d23a8dd39ccfe86c164d52c03d59f103a904eec  .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test

% # switch to lts-13.27

% stack test
Linking .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test ...
2 examples, 0 failures

% sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test
c0d644d1d1f9d90e241ba5191bbbf2ca15c3ce67  .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test

% # switch to lts-13.26

% stack test
2 examples, 0 failures

% sha1sum .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test
c0d644d1d1f9d90e241ba5191bbbf2ca15c3ce67  .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test

As shown, I switch resolvers, and then the binary at .stack-work/dist/x86_64-linux/Cabal-2.4.0.1/build/lz4-frame-conduit-test/lz4-frame-conduit-test is not updated by stack test (it's the same binary as from before I switched resolvers).

qrilka commented 4 years ago

Ouch, I didn't take into account that test executables don't go into snapshot-specific directory... I guess this is just another case of ghc-pkg(?) treating executables and packages differently and Stack got it wrong again... And maybe component-based builds could help here as well? :) Stack sees your library in .stack-work/dist/x86_64-linux/XXX and thus it assumes that nothing needs to be rebuilt. Ok, I will take a look into some way out of this (hopefully without going into component-based builds).

qrilka commented 4 years ago

After trying to fix this I came to conclusion that this doesn't look to be easily resolvable with the way stack builds work and the way to fix it properly is to do proper component-based builds as in #4745 The problem with an attempt to fix this seem to break the current way Stack excludes some rebuilds.

mpilgrem commented 5 months ago

I am going to close this issue in favour of #6356, as it appears to be an instance of component-based builds.