commercialhaskell / stack

The Haskell Tool Stack
http://haskellstack.org
BSD 3-Clause "New" or "Revised" License
3.95k stars 842 forks source link

Constructing the build plan is slow in project with many packages #6551

Closed wraithm closed 2 months ago

wraithm commented 3 months ago

General summary

I'm testing out upgrading from 2.13.1 to 2.15.5 for our company monorepo. I've found that there has been a serious performance drop for stack build in the newest version. Though, I noticed that 2.13.1 is also significantly slower than 2.7.5.

stack build has gotten slower when there's nothing to build with many packages. If one runs stack build followed immediately by another stack build (no-op), we see that the no-op build takes a significant amount of time, when I would expect it to be pretty fast (ideally <2 s).

We have >80 package.yamls in our repo. When I do a verbose build, I've noticed that after the Constructing the build plan log line, things go slowly. It seems that there's a sqlite database query for every package that's being run sequentially or something like that, and the time accumulates. There's references to getPackageFiles taking some amount of time. I assume the SQL queries are related to getPackageFiles in some way.

Here's a table showing my computer's run-time of the no-op stack build for various stack versions:

Stack Version No-op build time
2.15.5 ~22 s
2.15.3 ~22 s
2.13.1 ~7 s
2.11.1 ~29 s
2.9.3 ~22 s
2.9.1 ~22 s
2.7.5 ~2 s

stack-2.7.5 is pretty reasonable, 2 s is totally acceptable for a big repo like ours. We're on 2.13.1 now clocking in at around 7 s, which is still annoying but workable. 22 s for a no-op build is pretty brutal.

Steps to reproduce

For example:

  1. Run stack build
  2. Run stack build again (should be a no-op)

Expected vs Actual

I'd expect the second stack build to be nearly instantaneous, but it's actually quite slow (see table above).

I've attached a gist of some censored debug output: https://gist.github.com/wraithm/86968037f99e8a10bf46e47f7d9abe6f

This gist has two files generated with:

stack-<stack version> --verbose build 2> >(rg getPackageFiles | sed -e 's/getPackageFiles.*/getPackageFiles/' > getPackageFiles-<stack version>.log)

where <stack version> is either 2.15.5 or 2.13.1, which corresponds to builds with those versions of stack.

You can see that the 2.15.5 one took about 22 s and the 2.13.1 one took about 7 s. In between those getPackageFiles log lines are SQL queries.

The SQL queries in the log look like this (again censored):

2024-04-05 15:19:14.413617: [debug] (SQL) SELECT "id","directory","type","pkg_src","active","path_env_var","haddock" FROM "config_cache" WHERE "directory"=? AND "type"=?; [PersistText "/path/to/repo/.stack-work/install/x86_64-linux-tinfo6/2ed63679412a6629a520726d957ec790c8a864b0615bc44b20bcbbcb34ba4597/9.4.8/",PersistText "lib:library-name"]
2024-04-05 15:19:14.413764: [debug] (SQL) SELECT "id", "config_cache_id", "index", "option" FROM "config_cache_dir_option" WHERE ("config_cache_id"=?) ORDER BY "index"; [PersistInt64 501]
2024-04-05 15:19:14.413916: [debug] (SQL) SELECT "id", "config_cache_id", "index", "option" FROM "config_cache_no_dir_option" WHERE ("config_cache_id"=?) ORDER BY "index"; [PersistInt64 501]
2024-04-05 15:19:14.414118: [debug] (SQL) SELECT "id", "config_cache_id", "ghc_pkg_id" FROM "config_cache_dep" WHERE ("config_cache_id"=?); [PersistInt64 501]
2024-04-05 15:19:14.414250: [debug] (SQL) SELECT "id", "config_cache_id", "component" FROM "config_cache_component" WHERE ("config_cache_id"=?); [PersistInt64 501]
2024-04-05 15:19:14.414528: [debug] Start: getPackageFiles /path/to/repo/path/to/packagename/packagename.cabal
2024-04-05 15:19:14.416066: [debug] Finished in 1ms: getPackageFiles /path/to/repo/path/to/packagename/packagename.cabal

Here's our (abbreviated) stack.yaml:

resolver: lts-21.22

flags:
  zlib:
    pkg-config: true
  xlsx:
    microlens: true

ghc-options:
  "$locals": -Werror
  "$everything": -haddock

Stack version

I'm mostly testing ghcup's stack-2.15.5:

$ stack --version
Version 2.15.5, Git revision 5649cc6b2522f51f0fc5543154b0fff868f9af31 x86_64 hpack-0.36.0

Method of installation

I've tried both ghcup and nixpkgs for most of the versions in the table there, but the numbers all work out to be the same regardless of nixpkgs vs ghcup.

Platform

Your platform (machine architecture and operating system)

Intel x86_64 - Ubuntu 22.04

mpilgrem commented 3 months ago

@wraithm, thanks for reporting. The major changes between Stack 2.13.1 and 2.15.1 were those that allowed Stack to build packages with dependencies on public sub-libraries of other packages. I am pinging @theobat, who might have an insight.

wraithm commented 3 months ago

Something a little suspicious that would lead me to think this doesn't have anything to do with sub-libs though is that my experiments with 2.9.x also were slow and almost exactly the same as 2.15.x.

mpilgrem commented 3 months ago

@wraithm, you mention an attached gist. Sorry if it is my oversight, but I cannot see where it is attached. Can you give me a pointer?

wraithm commented 3 months ago

@wraithm, you mention an attached gist. Sorry if it is my oversight, but I cannot see where it is attached. Can you give me a pointer?

Sorry, I'm working on that rn. It'll be up in a bit.

mpilgrem commented 3 months ago

@wraithm, you have a need to censor - if you happen to know of any public multi-package project I could investigate with, that would be helpful. (EDIT: Although, having said that, I suppose a first step is to understand how two-package projects behave as between Stack 2.13.1 and 2.15.5.)

wraithm commented 3 months ago

@mpilgrem Here's the gist: https://gist.github.com/wraithm/86968037f99e8a10bf46e47f7d9abe6f I also added it to the original issue description. I generated those two files with the following command:

stack-2.15.5 --verbose build 2> >(rg getPackageFiles | sed -e 's/getPackageFiles.*/getPackageFiles/' > getPackageFiles-2.15.5.log)

(and equivalently for 2.13.1)

I'm gonna add more description above.

wraithm commented 3 months ago

@wraithm, you have a need to censor - if you happen to know of any public multi-package project I could investigate with, that would be helpful. (EDIT: Although, having said that, I suppose a first step is to understand how two-package projects behave as between Stack 2.13.1 and 2.15.5.)

That's a great question wrt public repos, @mpilgrem. I can't think of a really big one quite like ours... I guess there's things like amazonka, but that has weird codegen and isn't quite analogous. Hm. I'll look at hackage and see if there's anything interesting to compare to.

I'm more than happy to be a guinea pig! I can build stack and test patches and stuff. I've contributed to stack in the past (very minorly).

btw, I added some more context in the issue description.

wraithm commented 3 months ago

I'd be really suspicious of something that was added to 2.9.x, then removed for 2.11.x or 2.13.x, then re-introduced into 2.15.x. Because 2.9.x behaves almost exactly like 2.15.x!

mpilgrem commented 3 months ago

Stack.PackageFile.getPackageFile had some work done to it, as part of the changes that I mentioned above (eg https://github.com/commercialhaskell/stack/commit/a5ff987b0817873666ecf2c8e7dd44ce9747d5c6). Understanding the effect of those seems a sensible place to start.

EDIT: Perhaps, more specifically: Stack.PackageFile.packageDescModulesAndFiles.

wraithm commented 3 months ago

I just did 2.11.1, and it takes around ~29 s on average for a no-op build. (Added to the table)

mpilgrem commented 3 months ago

I do not know why Stack 2.13.1 should have been so much faster than Stack 2.11.1 for a 'no op' stack build - another puzzle.

wraithm commented 3 months ago

Yeah, it's truly baffling. I've tried both ghcup and nixpkgs, and it's all really consistent. <2.7.5 and randomly 2.13.1 are randomly fast. Everything else is slow!

Also, I just built stack on master locally and got ~22 s.

wraithm commented 3 months ago

I found that about a cumulative 14 s is spent in the processDep function, specifically, in the call to getLatestApplicableVersionAndRev within processDep. It takes several milliseconds for getHackagePackageVersions to run, so it seems to just add up. That at least accounts for a significant chunk of the time.

(22 - 14 = 8, which is pretty close to what stack-2.13.1 takes to run!)

I also built a version of stack-2.13.1 locally and found that getHackagePackageVersions doesn't get called! So, I think that is our culprit. getHackagePackageVersions isn't getting called during the build plan on 2.13.1 and it's getting called on basically every single dependency in 2.15.1, which adds up.

wraithm commented 3 months ago

@mpilgrem , I bisected it down to this commit: https://github.com/commercialhaskell/stack/commit/86e73a7bcfe1532cfa6bea5d1d55565d4b4e76cd

Looks like getHackagePackageVersions is getting called on all of our deps (multiple times for the same dep, once for every dep of every package). Before this commit, getHackagePackageVersions is not called at all on a no-op build.

In our repo, the commit before 86e73a7bcfe1532cfa6bea5d1d55565d4b4e76cd takes 7 s to do a no-op build and 86e73a7bcfe1532cfa6bea5d1d55565d4b4e76cd takes 22-25 s.

mpilgrem commented 3 months ago

@wraithm, many thanks for the detective work. The following are some notes for my own use:

In Stack 2.13.1, getHackagePackageVersions is used only in getLatestApplicableVersionAndRev and that is used in only two places as part of yielding a ConstructPlanException:

(1) in addPackageDeps, when the result of getCachedDepOrAddDep depname yields a Left e; and

(2) in addPackageDeps, when the result of getCachedDepOrAddDep depname yields a Right adr but inRange is False.

The stated intent of https://github.com/commercialhaskell/stack/commit/86e73a7bcfe1532cfa6bea5d1d55565d4b4e76cd was refactoring.

By Stack 2.15.5, getLatestApplicableVersionAndRev has been moved to become a top level function of Stack.Build.ConstructPlan. It now features in processDep but only once (extracts):

processDep pkgId name value = do
  mLatestApplicable <- getLatestApplicableVersionAndRev name range  -- <<<< It has been moved here
  eRes <- getCachedDepOrAddDep name
  case eRes of
    Left e -> do
      ...
      pure $ Left (name, (range, mLatestApplicable, bd))
    ...
    Right adr -> do
      ...
      pure $ if inRange
        then ...
        else Left
          ( name
          , ( range
            , mLatestApplicable
            , DependencyMismatch $ adrVersion adr
            )
          )

So, I think the solution is likely to move it back into the branches where it applies. EDIT: If it proves to be the solution, should I be disappointed that GHC does not work out itself that mLatestApplicable will not be needed in the other branches? EDIT2: I think the answer is no: the result of the action may not be needed, but the action itself may have changed the state of the world.

wraithm commented 2 months ago

Amazing! I confirmed that this does work. Our no-op build times are around 7-8 s with this patch in #6552.

EDIT: If it proves to be the solution, should I be disappointed that GHC does not work out itself that mLatestApplicable will not be needed in the other branches?

You might be disappointed, but the M monad is gonna evaluate those args every time, I suspect it's because it's a strict WriterT, or maybe the RIO base in that transformer stack is bringing in the strictness, but yeah, long story short, the monad bind here is strict, and it'll evaluate every time. 👍🏻

Are there other places where that sort of refactor has occurred? That might account for the other 5 s or so difference from 2.7.5.

mpilgrem commented 2 months ago

@wraithm, on 2.7.5 (2 s) to 2.13.1 (7 s), I am looking at the change log but nothing is jumping out at me. However, that log only focusses on things visible to users. Since I took over the release process (from Stack 2.9.1) there has not been much change that has involved stack build, other than that associated with https://github.com/commercialhaskell/stack/issues/6551#issuecomment-2040677615. However, I'll be alert to the sort of thing that caused this regression.

mpilgrem commented 2 months ago

I will open a separate issue for Stack 2.7.5 to Stack 2.13.1.