Closed wraithm closed 2 months ago
@wraithm, thanks for reporting. The major changes between Stack 2.13.1 and 2.15.1 were those that allowed Stack to build packages with dependencies on public sub-libraries of other packages. I am pinging @theobat, who might have an insight.
Something a little suspicious that would lead me to think this doesn't have anything to do with sub-libs though is that my experiments with 2.9.x also were slow and almost exactly the same as 2.15.x.
@wraithm, you mention an attached gist. Sorry if it is my oversight, but I cannot see where it is attached. Can you give me a pointer?
@wraithm, you mention an attached gist. Sorry if it is my oversight, but I cannot see where it is attached. Can you give me a pointer?
Sorry, I'm working on that rn. It'll be up in a bit.
@wraithm, you have a need to censor - if you happen to know of any public multi-package project I could investigate with, that would be helpful. (EDIT: Although, having said that, I suppose a first step is to understand how two-package projects behave as between Stack 2.13.1 and 2.15.5.)
@mpilgrem Here's the gist: https://gist.github.com/wraithm/86968037f99e8a10bf46e47f7d9abe6f I also added it to the original issue description. I generated those two files with the following command:
stack-2.15.5 --verbose build 2> >(rg getPackageFiles | sed -e 's/getPackageFiles.*/getPackageFiles/' > getPackageFiles-2.15.5.log)
(and equivalently for 2.13.1)
I'm gonna add more description above.
@wraithm, you have a need to censor - if you happen to know of any public multi-package project I could investigate with, that would be helpful. (EDIT: Although, having said that, I suppose a first step is to understand how two-package projects behave as between Stack 2.13.1 and 2.15.5.)
That's a great question wrt public repos, @mpilgrem. I can't think of a really big one quite like ours... I guess there's things like amazonka, but that has weird codegen and isn't quite analogous. Hm. I'll look at hackage and see if there's anything interesting to compare to.
I'm more than happy to be a guinea pig! I can build stack and test patches and stuff. I've contributed to stack
in the past (very minorly).
btw, I added some more context in the issue description.
I'd be really suspicious of something that was added to 2.9.x, then removed for 2.11.x or 2.13.x, then re-introduced into 2.15.x. Because 2.9.x behaves almost exactly like 2.15.x!
Stack.PackageFile.getPackageFile
had some work done to it, as part of the changes that I mentioned above (eg https://github.com/commercialhaskell/stack/commit/a5ff987b0817873666ecf2c8e7dd44ce9747d5c6). Understanding the effect of those seems a sensible place to start.
EDIT: Perhaps, more specifically: Stack.PackageFile.packageDescModulesAndFiles
.
I just did 2.11.1, and it takes around ~29 s on average for a no-op build. (Added to the table)
I do not know why Stack 2.13.1 should have been so much faster than Stack 2.11.1 for a 'no op' stack build
- another puzzle.
Yeah, it's truly baffling. I've tried both ghcup and nixpkgs, and it's all really consistent. <2.7.5 and randomly 2.13.1 are randomly fast. Everything else is slow!
Also, I just built stack
on master
locally and got ~22 s.
I found that about a cumulative 14 s is spent in the processDep
function, specifically, in the call to getLatestApplicableVersionAndRev
within processDep
. It takes several milliseconds for getHackagePackageVersions
to run, so it seems to just add up. That at least accounts for a significant chunk of the time.
(22 - 14 = 8, which is pretty close to what stack-2.13.1 takes to run!)
I also built a version of stack-2.13.1
locally and found that getHackagePackageVersions
doesn't get called! So, I think that is our culprit. getHackagePackageVersions
isn't getting called during the build plan on 2.13.1 and it's getting called on basically every single dependency in 2.15.1, which adds up.
@mpilgrem , I bisected it down to this commit: https://github.com/commercialhaskell/stack/commit/86e73a7bcfe1532cfa6bea5d1d55565d4b4e76cd
Looks like getHackagePackageVersions
is getting called on all of our deps (multiple times for the same dep, once for every dep of every package). Before this commit, getHackagePackageVersions
is not called at all on a no-op build.
In our repo, the commit before 86e73a7bcfe1532cfa6bea5d1d55565d4b4e76cd takes 7 s to do a no-op build and 86e73a7bcfe1532cfa6bea5d1d55565d4b4e76cd takes 22-25 s.
@wraithm, many thanks for the detective work. The following are some notes for my own use:
In Stack 2.13.1, getHackagePackageVersions
is used only in getLatestApplicableVersionAndRev
and that is used in only two places as part of yielding a ConstructPlanException
:
(1) in addPackageDeps
, when the result of getCachedDepOrAddDep depname
yields a Left e
; and
(2) in addPackageDeps
, when the result of getCachedDepOrAddDep depname
yields a Right adr
but inRange
is False
.
The stated intent of https://github.com/commercialhaskell/stack/commit/86e73a7bcfe1532cfa6bea5d1d55565d4b4e76cd was refactoring.
By Stack 2.15.5, getLatestApplicableVersionAndRev
has been moved to become a top level function of Stack.Build.ConstructPlan
. It now features in processDep
but only once (extracts):
processDep pkgId name value = do
mLatestApplicable <- getLatestApplicableVersionAndRev name range -- <<<< It has been moved here
eRes <- getCachedDepOrAddDep name
case eRes of
Left e -> do
...
pure $ Left (name, (range, mLatestApplicable, bd))
...
Right adr -> do
...
pure $ if inRange
then ...
else Left
( name
, ( range
, mLatestApplicable
, DependencyMismatch $ adrVersion adr
)
)
So, I think the solution is likely to move it back into the branches where it applies. EDIT: If it proves to be the solution, should I be disappointed that GHC does not work out itself that mLatestApplicable
will not be needed in the other branches? EDIT2: I think the answer is no: the result of the action may not be needed, but the action itself may have changed the state of the world.
Amazing! I confirmed that this does work. Our no-op build times are around 7-8 s with this patch in #6552.
EDIT: If it proves to be the solution, should I be disappointed that GHC does not work out itself that
mLatestApplicable
will not be needed in the other branches?
You might be disappointed, but the M
monad is gonna evaluate those args every time, I suspect it's because it's a strict WriterT
, or maybe the RIO
base in that transformer stack is bringing in the strictness, but yeah, long story short, the monad bind here is strict, and it'll evaluate every time. 👍🏻
Are there other places where that sort of refactor has occurred? That might account for the other 5 s or so difference from 2.7.5.
@wraithm, on 2.7.5 (2 s) to 2.13.1 (7 s), I am looking at the change log but nothing is jumping out at me. However, that log only focusses on things visible to users. Since I took over the release process (from Stack 2.9.1) there has not been much change that has involved stack build
, other than that associated with https://github.com/commercialhaskell/stack/issues/6551#issuecomment-2040677615. However, I'll be alert to the sort of thing that caused this regression.
I will open a separate issue for Stack 2.7.5 to Stack 2.13.1.
General summary
I'm testing out upgrading from 2.13.1 to 2.15.5 for our company monorepo. I've found that there has been a serious performance drop for
stack build
in the newest version. Though, I noticed that 2.13.1 is also significantly slower than 2.7.5.stack build
has gotten slower when there's nothing to build with many packages. If one runsstack build
followed immediately by anotherstack build
(no-op), we see that the no-op build takes a significant amount of time, when I would expect it to be pretty fast (ideally <2 s).We have >80
package.yaml
s in our repo. When I do a verbose build, I've noticed that after theConstructing the build plan
log line, things go slowly. It seems that there's a sqlite database query for every package that's being run sequentially or something like that, and the time accumulates. There's references togetPackageFiles
taking some amount of time. I assume the SQL queries are related togetPackageFiles
in some way.Here's a table showing my computer's run-time of the no-op
stack build
for various stack versions:stack-2.7.5 is pretty reasonable, 2 s is totally acceptable for a big repo like ours. We're on 2.13.1 now clocking in at around 7 s, which is still annoying but workable. 22 s for a no-op build is pretty brutal.
Steps to reproduce
For example:
stack build
stack build
again (should be a no-op)Expected vs Actual
I'd expect the second
stack build
to be nearly instantaneous, but it's actually quite slow (see table above).I've attached a gist of some censored debug output: https://gist.github.com/wraithm/86968037f99e8a10bf46e47f7d9abe6f
This gist has two files generated with:
where
<stack version>
is either2.15.5
or2.13.1
, which corresponds to builds with those versions of stack.You can see that the
2.15.5
one took about 22 s and the2.13.1
one took about 7 s. In between thosegetPackageFiles
log lines are SQL queries.The SQL queries in the log look like this (again censored):
Here's our (abbreviated)
stack.yaml
:Stack version
I'm mostly testing ghcup's stack-2.15.5:
Method of installation
I've tried both ghcup and nixpkgs for most of the versions in the table there, but the numbers all work out to be the same regardless of nixpkgs vs ghcup.
Platform
Your platform (machine architecture and operating system)
Intel x86_64 - Ubuntu 22.04