Open simonmichael opened 6 months ago
In terms of history, this was Michael Snoyman's appeal from November 2022: https://www.snoyman.com/blog/seeking-new-stackage-curator/. I think the curator team are all volunteers.
I think that merging within a couple of days/weeks is a reasonable service level.
Stackage curators usually give a decent heads-up when a package is up for removal because of outdated dependencies. If an open-source maintainer ignores the warning or takes weeks/months to make a patch, I would not say that the bottleneck is with curators when a package is added back. The simplest solution is not to get kicked out of Stackage in the first place.
Thanks for your input @Bodigrim. That's a scenario different from what I'm experiencing and calling attention to. I think I'm right in saying you have added a bunch of libraries to stackage in the past, and you haven't had to touch them again. My experience is more of getting end-user applications with big dependency footprints into stackage, and more importantly keeping them there as they or their deps get regularly pushed out with every GHC version bump or incompatible lib release.
My experience is more of getting end-user applications with big dependency footprints into stackage, and more importantly keeping them there as they or some number of deps get regularly pushed out with every GHC version bump or incompatible lib release.
So assuming that you waited for all your dependencies to catch up and make releases, which as I imagine easily takes months, is Stackage PR merge time really a bottleneck in the process? Or is it more of an annoying last mile?
More of an annoying last mile, certainly. I probably still haven't made my point clearly: it's that I feel
Hi @simonmichael , here is @alaendle - one of the contributors. First thanks for your contributions and I'm sorry that you're experiencing "friction". However I like to add that from my perspective there is no "real" problem (sure, things could always be better) - and that we all maintain this on a volunteer unpaid service. Maybe you had bad luck with your PR's - there are times where it is hard to establish a working package-set and during these times PR's are usually not merged (which makes no difference since if the build fails no new package-set is published). Also I tried to gain a better understanding of our PR processing times. By using https://www.repotrends.com/commercialhaskell/stackage we get:
So to be honest I don't know what to do; any concrete proposal that would not lead to more pressure on the curator team?
Hello. Stackage curator here. I try to review the PRs and merge them as soon as the nightly set is green again, so every delay in merging them is caused by hardness in getting a working-package set. I think all curators are following a similar process
getting end-user applications with big dependency footprints into stackage, and more importantly keeping them there
My hot take is that Stackage isn't really for this. End users reap little to no benefit from an application being in or staying in Stackage; most end users don't have Haskell tools installed and would be better served by a pre-built distribution.
Stackage is more about providing a stable set of libraries for Haskell developers rather than being an application distribution platform. (The exception being applications used as Haskell build tools.)
Thanks to the curators responding so far. I hope it's clear I'm thankful for your work and aiming to help not harass. Excuse any blunt tone, it's just laziness/hurry. :)
My hot take is that Stackage isn't really for this.
I see stackage as the road to getting haskell software widely packaged; distros/packaging systems naturally use it as their starting point. It also helps me provide reliable end-user install instructions/procedures when platform packages aren't available. Similarly it helps me reliably install unpackaged tools in CI workflows. So when you say application, think also of all the tools written in haskell that you'd like to use. For these reasons I find it important to get applications into stackage, not just libraries.
every delay in merging them is caused by hardness in getting a working-package set
This is useful insight! So it seems like every PR, even ones that seem trivial and zero risk, is gated on the entire stackage build process working, is that right ? This seems likely to cause delays that often seem weird to contributors, who aren't aware of the bigger picture.
we all maintain this on a volunteer unpaid service
I'm well aware and grateful. And maybe this should change ? It's one of the things we could think about. I donate to the Haskell Foundation, and I donate time and software to (unknown) well funded corporations, possibly a few of which also fund the Foundation. I would like to have some fraction of this going to support the people maintaining key infrastructure like Stackage.
there are times where it is hard to establish a working package-set and during these times PR's are usually not merged
It makes total sense. Coupled with that, I believe the current system is that a single person is responsible for curating all of stackage, and this person changes each week. Both of these are very opaque to contributors and I think explain why PR merge or even response time has very high variance. Variance / lack of response makes contributing costly. Sometimes they go right in, sometimes they are ignored and you have to keep your attention on it and keep checking back, and eventually make noise and nag people until done, and none of us like that.
any concrete proposal that would not lead to more pressure on the curator team?
Yes, here's a few from me:
We raise some attention and discussion on this and see if we have some agreement on issues and support for trying to improve things. In progress.
Stackage curators maintain a presence in the #haskell-stack:matrix.org (and optionally #haskell-stack @ libera IRC) rooms. Chat need not be synchronous (especially Matrix), but it complements the issue tracker with a faster and often more efficient communication channel. (Not Slack. Haskell is open source, all of Stackage is open source, Matrix and IRC is where the open source maintainers live.)
Adjust the stackage curation system so more than one person is involved at a time. Eg try to get to a more conventional setup where all maintainers lurk in the chat, they help when they have time, and a release manager or two drives things along in each release cycle (as in the Cabal project).
Ask the Haskell Foundation for advice and perhaps funding. Eg as I suggested above, stipends for curators, or fund an ongoing part-time person. @chreekat is a great example of this working.
Renew attempts to attract more curators, help, and energy, with an updated more responsive FOSS-style process (drop the opaque application form dated 2022, build more chat presence, etc.)
These depend also on the current stackage team's availability, energy, desires, which I don't know. But these are a few ideas.
Last post for the day. You core contributors will have much insight based on your greater experience with the work and the problems; I hope this at least starts a conversation, and I hope to hear about the goals and challenges and solutions as you see them. Haskell, Hackage, Stackage, onward and upward!
(Um.. if you are reading this in the Azores.)
To complement @alaendle's useful data, I thought I'd better check my own (anec)data too. Here are pull requests I've been involved in and the days to resolve. Yes I went overboard on this, but
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
Well done to @simonmichael for raising this issue — it is important to raise awareness with all parties about what is happening here and to have clear open discussions.
It is probably worth bearing in mind the following points.
Thanks @cdornan!
Of people responding here so far, I seem to be the only one feeling a problem. Perhaps there aren't many people with usage patterns like mine. Also, for me getting back into stackage is often the last mile after a series of fixes across multiple projects, so patience can be a little short after that. Also, you curators seem happy with the current setup. So I won't belabour this much more..
But I still think more efficiency/less waste is usually a win and some of the ideas are worth considering. I'll throw out a few more:
there are times where it is hard to establish a working package-set and during these times PR's are usually not merged (which makes no difference since if the build fails no new package-set is published)
I think it makes a difference - for me the more important event is when the PR is resolved, so I can free up attention for other things. So,
Also:
Make it easy to see who is "on duty", so contributors know who to talk to when there's a problem.
Make it easier to see the current queue / dependencies / state of the build, so contributors have a better awareness of when delays are to be expected.
Have a bot (or lower-tech solution) to help ensure that PRs are not overlooked and lost, eg escalating them after a week, reducing uncertainty and the need for monitoring.
Sorry I was a little overloaded this week and only finally got to catch up today.
tldr: We try our best - I feel it is okay if a PR takes a couple of days to get merged, it is not the end of the world. Of course ideally it should happen within 24 hours but are all volunteers and it is a pretty manual process keeping nightly running. Certainly it would be unreasonable to expect an instant response, right?
Not sure what you mean about unrelated errors: well the (CI) check script just reports whether stackage is currently bounds clean or not that is all: it is pretty easy to see usually if the bounds issues are related to the PR or not :man_shrugging: - that is not a bottleneck I would say.
There are also occasions when major breakage shows up and it can take quite a bit of untangling to get things running again. Fortunately that it seldom now (it was worst when we tried to have more than one lts version) and luckily that was not the case this week: just a few revisions which broke the bounds for some testsuites mostly.
One suggestion would to let other curators merge PRs - but this can potentially step on the person-on-duty's toes. There are also timezone issues - perhaps we should be more transparent about who is on duty. (already suggested) But I also don't think people should be pinging curators just because their PR didn't merged instantly. I honestly don't think there is must we can do except work harder...: of course you could also join the stackage team to help out :-) Well if someone can make a stackage build webapp then it could all be more interactive and opened up, but that is a dream/big undertaking.
Not sure what you mean about unrelated errors:
@juhp I mean eg failures like the one at #7405 which seem unrelated to the PR itself. I've had this happen before, and it's a bit confusing to the contributor, it masks the status of the PR, and requires more back and forth communication and retries.
One suggestion would to let other curators merge PRs - but this can potentially step on the person-on-duty's toes.
That's how I expected this worked - the normal parallelism for "unrelated PRs" that you'd see in big packaging systems, like homebrew etc. Now I think I see that stackage PRs are really highly sequential.
What sort of stepping on toes are you thinking of ? I was imagining "harmless changes" like my recent https://github.com/commercialhaskell/stackage/pull/7405/commits/0fb3c519de68f27de6af6d235540d763b7a09e03 re-enabling a known-to-build package as easy to merge even if other packages are having problems. Is that too risky an assumption in general ?
For people skimming this discussion: I hope it's clear throughout that as a volunteer FOSS maintainer myself I don't want anyone to work harder. I want us all to work less / get more bang for our time. :)
For what it's worth, I'd like to give a quick summary of the curation process at the upcoming ecosystem workshop colocated at Zurihac. I'll include some motivation for why I think Stackage is important to the whole ecosystem, and I'd like to encourage contributions as well.
To do the encouraging, I think we need better descriptions of what is actually going on behind the curtain. That's why I've spent most of my time the last few weeks figuring out what all the code actually does. People (not least of all myself) need hooks for entry.
Not all of what I've learned should (or can) go into the presentation, but hopefully it will be a starting point. Happily I think this actually hits on some of @simonmichael 's concrete suggestions above. :)
Will the slides be available after the event? Or a recording/livestream? Unfortunately I cannot come but would have been really interested in seeing how the curation work is seen from the outside and what we can improve
One suggestion would to let other curators merge PRs - but this can potentially step on the person-on-duty's toes.
What sort of stepping on toes are you thinking of ? I was imagining "harmless changes" like my recent 0fb3c51 re-enabling a known-to-build package as easy to merge even if other packages are having problems. Is that too risky an assumption in general ?
Let me try to expand/explain better, we have one person "on-duty" each week - basically they are logged into the build server and responsible for keeping the nightly builds running and also for running lts builds once a week.
It is not unusual for the nightly/lts builds to get into a broken state, in which one or more packages needs attention due to compilation errors, testsuite failures or other breakage.
If someone else goes and pushes in a PR which may look innocent, it can still fail in Stackage (eg many people don't bother to test their testsuites properly, or it nevertheless can fail in our env/package set) and add to the burden of the curator already struggling to get the nightly build going again. The same applies now even more now to lts, which tends to be more fragile due to slower build cadence: though we don't see many lts-haskell PRs yet. Conversely if nightly is broken, even merging the PR won't help that much because it still won't get pushed out anyway until the nightly build is fixed... It is not like PRs get dropped or fall through the cracks, we get to them soon anyway. So I am not saying we can't do it, just that nothing is for free and we prefer the person on duty to control/curate what is going into nightly day-by-day and not take them by surprise. It can also simply lead to git conflicts if the curator is unaware and again create a little more work/hassle for them.
With more automation this would be less of a problem... I personally feel it is crying out for more automation, but we are all too busy for that - it is really only Adam who made commenter (which is a separate manually run tool) and the new lts-haskell workflow, who has contributed to improving our process situation lately.
Apparently @alaendle will at the Ecosystem Workshop: he is familiar with the daily workflow.
Not sure what you mean about unrelated errors:
@juhp I mean eg failures like the one at #7405 which seem unrelated to the PR itself. I've had this happen before, and it's a bit confusing to the contributor, it masks the status of the PR, and requires more back and forth communication and retries.
Yep it happens frequently - improvements to the CI would be welcome of course
From looking at the script run in CI, I guess the build failure is because CI is doing the same thing that the nightly run does? So if the nightly happens to broken, CI will break in the same way.
The immediate thought would be to instead run CI against a known-good snapshot, such as the last completed nightly. That would provide better feedback to PR authors. But it would not be good feedback to curators. @juhp has pointed out that they probably don't want to actually merge anything until nightly is actually fixed --- a red CI is a nice reminder that something needs fixing.
So, I don't have any immediate useful thoughts.
With more automation this would be less of a problem... I personally feel it is crying out for more automation, but we are all too busy for that
We can encourage more bandwidth by clearing stating what we'd like to see automated and motivating contributors. :)
Do you have specific goals you'd like to achieve with automation? Less manual work for curators is the obvious one, I guess? But that's rather vague, so more specific goals would be better.
And do you have any specific ideas for how to accomplish the goals?
A side comment: Atm, stackage actively discourages curator applications: https://github.com/commercialhaskell/stackage/blob/46c83d41881fe3bb16fd43bb43174b663b5d38dd/become-a-curator.md
We are no longer accepting applications, please wait until next time
So we can actually not have the problem we are discussing here... ;-)
A side comment: Atm, stackage actively discourages curator applications: https://github.com/commercialhaskell/stackage/blob/46c83d41881fe3bb16fd43bb43174b663b5d38dd/become-a-curator.md
We are no longer accepting applications, please wait until next time
Perhaps that could be better phrased, hehe - but actually it's quite a relevant comment you made. (We are actually one curator down currently :cry: - so maybe look out for a call before long.)
Just to give some more color/context: I would not say we are actively discouraging new curators, but also not actively seeking them continuously either.
The general philosophy has been to do "recruitment" periodically when needed. It might be a opportunity to expand the team slightly soon.
As it stands we are each only on duty roughly every other month for a week, so an even bigger team would maybe put us individually further out of sync with the flow of events. There are pros and cons to having a larger team: perhaps we could consider speeding up the roster cadence but one week chunks seem a convenient time-frame, if slightly long perhaps. At the other extreme if we rotated every day it would get quite chaotic with the overhead of rapid issue handovers etc.
I feel the better thing would be to automate Stackage more: commenter is a good step forward but it only handles disabling of packages not fixing bounds yet. If it could do that semi-automatically that would be a big win or game changer: maybe we should also rewrite it in Haskell - not sure on the performance hit. :wink:
https://github.com/commercialhaskell/stackage/blob/master/CURATORS.md#pull-requests describes the stackage PR process. I'm not sure it's working as intended. Lately I find PRs typically take days, weeks, multiple pings/monitoring/followups to get merged. (The CI tests are unreliable, also, with seeming false failures.) This discourages upstream maintainers like myself and others trying to improve stackage snapshots and fix the Haskell ecosystem's abundant papercuts. Conversely, low friction and fast feedback loops would encourage more contribution.
What could be done ? Could processes be streamlined ? Is the stackage curator team underpersonned ? Is curation work all thanks to volunteers at present ? What about funding from the Haskell foundation - a stipend for curators, or paying for a regular part-time curator ? What if stackage curators were present in the #haskell-stack:matrix.org room so that we could discuss and resolve issues more efficiently ?