Open tbenz9 opened 6 years ago
I can see the benefits of release branches, especially where it comes to helping keep a more stable master branch. Additionally building that workflow of having larger release branches now would benefit us in the long run as the dev team continues to grow and we potentially get to the point where we have multiple teams working on parallel projects. If nothing else I think it would make it easier to quickly see what all the changes for that release were so we can more easily communicate that to the community.
So new features will always be developed on branches of the "main" dev branch?
Exact implementation can be discussed. My initial thought is that: New features are developed on feature branches (exactly how it is now). Those feature branches will be merged into a "release branch" one for 1.3.4, 1.3.5, etc. Those release branches will be merged into master on day of general release.
Feature Branches (many) -> Release Branches (few) -> Master Branch (one)
Thanks for raising this up, @tbenz9. I think it's good to re-evaluate git workflow every once in a while.
That said, I think the current workflow is pretty close to optimal for Sia's needs at the moment.
Master is always stable. This is best practice for situations like Siaberry where it pulls master every Tuesday.
I don't think this should actually be a goal. The latest release is always stable, which I think is sufficient and what most clients expect. I don't know of any open source project where users are told that they can run an arbitrary snapshot of the master branch and expect release-grade code.
Siaberry in particular follows tons of poor practices out of ignorance. I reported to them two months ago that they were putting their users at risk by blindly upgrading to the latest master branch and they still haven't fixed the issue. They should be a non-consideration for deciding which workflow is ideal.
Development can begin on larger features earlier. Feature branches could be merged into any release branch based on the roadmap. In other words, features could be staged for future releases.
Isn't this already possible? Nothing is preventing feature branches today.
Easier Minor and patch releases. I believe Sia should be moving to more frequent patch releases for bug fixes with minor releases adding new features. (this is probably a separate discussion but this workflow would set us up for more frequent patch releases)
I don't think changing the branching strategy affects this. If we wanted to release a minor patch to 1.3.2 today, we could just check out the v1.3.2
tag, then cut a new release.
Am I misunderstanding?
Change is hard, this would require all Sia devs to modify their workflow slightly. ... Current outstanding PRs would need to be modified or resubmitted.
These are big. Not just for Nebulous Sia devs, but every Sia dev and contributor. Everyone who wants to contribute to Sia now has to learn which branch is the correct branch to submit to.
The only change I'd suggest is a small one.
Testing release candidates is very expensive. It requires contributors (and I'm assuming core devs) to run through a lot of scenarios manually to expose bugs that we can't catch with automated testing. Every time we have to cut a new release because RC1 was not stable enough, that's a huge setback and time sink for everyone involved.
Ideally, we should be spending as little time in RC as possible. We should aim for a single release candidate that we test for the minimum amount of time we need to give us confidence that it's stable. It looks like now we're considering ~2 weeks of no critical issues reported with the RC to be a success, which I think is about right.
Here are the stats for the last few releases:
Version | Total Development Time | Time in RC | Failed candidates |
---|---|---|---|
1.3.3 | 2.5 months (and counting) | 1 month (and counting) | 2 (so far) |
1.3.2 | 3 months | 1 month | 2 |
1.3.1 | 5 months | 3 months | 3 |
I think we're pretty far from where we want to be. Spending 1/3 to over 1/2 of the development cycle post-release-candidate means that we're either doing insufficient automated testing and/or we're practicing poor release hygiene. My proposal addresses the latter issue.
In other words, pencils down when we are feature complete on the upcoming release and we have all necessary tests to give us confidence it's stable. At that point, we create a release branch like 1.3.3-rc
. We continue doing normal development for version n+2 in master
. We only touch the *-rc
branch if release candidate testing reveals a showstopper bug.
A bug is considered a "showstopper" if:
In the case of a showstopper bug, we make a fix directly into the RC branch, then forward-port the change to master
.
Every change to Sia introduces a risk of regression. Because we continue to make changes to Sia after we're feature complete for version n+1, we're needlessly increasing our risk of regression even though we've already met acceptance criteria for the release.
After we release an RC1, we're currently still checking changes directly into the same master
branch that we use if we need to create an RC2. This means that if there are bugs in RC1, any change we made between RC1 and RC2 could introduce new bugs.
By doing this, we're not only increasing the risk of regression, but increasing the difficulty of debugging. If we see an issue in RC2 that we didn't see in RC1, it's a lot easier to trace the cause if we've only made one small code change to the code between RC1 and RC2. If we're accepting arbitrary PRs into the codebase that will become RC2, it becomes much harder to figure out if a new PR introduced a regression or if the issue was there in RC1 and just was never discovered.
Chris quickly and kindly fixed an issue I reported where Sia didn't give users an option to turn off contract renewal without deleting their contracts and data. I was certainly happy to see this feature go in and I look forward to it in 1.3.3, but if we were following my proposed release strategy, that PR would go into master
and not any 1.3.3 branch.
I'm certainly eager to see it go in, but it's also a fairly decent regression risk because it's hard to thoroughly test the allowance code. This fix wasn't a critical feature for 1.3.3 nor a regression that 1.3.3 introduced, so we would wait on it until 1.3.4.
Thanks for the writeup @mtlynch. I agree with your main points: the current process works pretty well for us, but our handling of RCs could be improved. Once we get v1.3.3 out the door, we should take some time to evaluate your proposal and maybe try it out for v1.3.4.
My general thoughts here are basically that Sia is still pretty early stage, still cycling very rapidly. Each RC that we've failed has typically been due to multiple high-priority bugs, and generally they tend not to be regressions, but just deep-seated bugs that have been around for many versions, and are only now getting caught because people continue to use Sia in more advanced ways, especially as each release unlocks more things that you can do with Sia.
It doesn't bother me much that we currently spend nearly half of our dev time in RC, because right now our RC time is actually really valuable. We clean up huge parts of the user experience and fix many lingering and exposed bugs that have been problems for users for a long time. As long as the RC process continues to get hung up on issues like the ones we've been seeing, I will continue to see it as time well spent.
I think this release flow is worth re-visiting in a year though, when things have progressed to the next stage. My guess is we'll still be in a 'rapid iterate, many usability fixes each RC' type state, because we do have a long road of user-critical updates ahead of us, but perhaps things will have matured/mellowed a bit more.
It doesn't bother me much that we currently spend nearly half of our dev time in RC, because right now our RC time is actually really valuable. We clean up huge parts of the user experience and fix many lingering and exposed bugs that have been problems for users for a long time. As long as the RC process continues to get hung up on issues like the ones we've been seeing, I will continue to see it as time well spent.
I'm not sure I follow. We're talking about bugs that have existed in previous releases but have only just been reported during the RC for the new release? Why delay a release over that? If a bug in 1.3.2 is so minor that someone only noticed it during focused testing of 1.3.3 RC1, that doesn't seem like it meets the bug bar.
If someone waited until 1.3.3 RC3 to make a totally new feature request (unrelated to 1.3.3 changes), would we hold up the release over it and roll over to RC4? If not, it doesn't really make sense to delay a release over a bug report of an issue that's unrelated to changes in the release.
My general thoughts here are basically that Sia is still pretty early stage, still cycling very rapidly
The current strategy prevents rapid iteration.
If 1.3.3 RC1 is stable modulo some issues that have existed since pre-1.3.3, instead of spending 6 weeks iterating on 1.3.3 with new RCs, we could promote RC1 to the official release and then in 6 weeks, release 1.3.4.
I strongly recommend establishing release criteria and not deviating from them except in extraordinary circumstances (e.g. an exchange needs an ASAP fix).
For example, the criteria for 1.3.3 would probably be that it adds the /renter/stream/
API, fixes selected accounting bugs, and whatever else was a goal for the release. If some idiot comes to you during RC and complains about Sia destroying all their contracts, you can tell them, "We're very sorry, but that's always been there and you're the first to complain about it, so the fix will ship in 1.3.4."
If the release criteria is just ad-hoc when people stop asking for things during RC, it only withholds new features from users for weeks/months even though they're complete and stable.
For example, the criteria for 1.3.3 would probably be that it adds the /renter/stream/ API, fixes selected accounting bugs, and whatever else was a goal for the release. If some idiot comes to you during RC and complains about Sia destroying all their contracts, you can tell them, "We're very sorry, but that's always been there and you're the first to complain about it, so the fix will ship in 1.3.4."
Going to disagree on that one. If someone comes forward with a bug that's destroying all contracts, or comes forward with a bug that shows the renewal system is wonky / haywire, or comes forward with an easily triggered memory leak or an overflow-panic, those are all things worth pushing a release back over.
The reason we haven't seen these bugs before is because people haven't been able to get the software to the point of renewals before. Or if they have, other bugs were severe enough that the renewal bugs went unnoticed. For a long time, people were just like "yeah I have 135 contracts now" and we didn't know why.
Each of those fixes were independently only a few days of work, but contribute substantially to the overall stability and usability of v1.3.3. The alternative is to release a v1.3.3 that's still quite unstable.
People pay a lot more attention to problems and bugs during the RC phase. Bugs during RC that get discovered and then backlogged tend to remain backlogged for a long time. And then when you come back to it, nobody remembers anymore exactly how to reproduce it, logs might have been lost or deleted or buried, and the exact context of the bug is a lot fuzzier.
Sia still has a lot of bugs in it. When people are giving the buggy parts a lot of attention, I think it's worth slowing things down and really diving into the bugs, building tests, making reproductions, and getting to the bottom of all the issues that are causing file and contract instability.
Bugs during RC that get discovered and then backlogged tend to remain backlogged for a long time. And then when you come back to it, nobody remembers anymore exactly how to reproduce it, logs might have been lost or deleted or buried, and the exact context of the bug is a lot fuzzier.
Sia still has a lot of bugs in it. When people are giving the buggy parts a lot of attention, I think it's worth slowing things down and really diving into the bugs, building tests, making reproductions, and getting to the bottom of all the issues that are causing file and contract instability.
We may be talking past each other.
I'm not advocating for backlogging issues at all. You would still investigate bugs from the RC and fix them the way you're currently doing. You'd still check the fixes into master
the way you're currently doing. The only difference would be that you'd have a dedicated branch for the release candidate. You wouldn't make any changes to that branch unless there was a serious regression or a critical bug in a new feature introduced by the release.
Just as a simple, hypothetical example, imagine that the allowance issue I reported (unrelated to 1.3.3 changes) was the only bug anyone reported during 1.3.3 RC2. Under the strategy I'm suggesting, everything would be the same about the fix process. Chris would still investigate it and check a fix into master. But instead of then incrementing to RC3, you'd declare RC2 a success and cut an official 1.3.3 release based on the commit used for RC2. The fix for the allowance issue would still be done, but it wouldn't come into effect until 1.3.4.
was the only bug anyone reported during 1.3.3 RC2
heh. I wish.
https://github.com/NebulousLabs/Sia/pull/3058, https://github.com/NebulousLabs/Sia/pull/3050, https://github.com/NebulousLabs/Sia/pull/3030, https://github.com/NebulousLabs/Sia/pull/3029, https://github.com/NebulousLabs/Sia/pull/3023, https://github.com/NebulousLabs/Sia/issues/3022, https://github.com/NebulousLabs/Sia/pull/3019, https://github.com/NebulousLabs/Sia/pull/3014
All are related to bugs discovered or reported by developers and users (through one channel or another) during RC2.
The fix for the allowance issue would still be done, but it wouldn't come into effect until 1.3.4
I don't feel content with that because I know that v1.3.4 is not going to come in the next 60 days, and likely (due to the ambitions we have for the renter) for close to 120 days after all testing and quality control is completed. Something that has such a small implementation surface yet such a big impact to users shouldn't be held back multiple months because we aren't willing to compromise our release schedule. Especially when you've got 8 such fixes, and many of them are things that app developers have been complaining about.
edits: I had 10 PRs linked originally, but 2 of them are actually RC3 fixes, not RC2 fixes.
I'm glad to see all the discussion here. In general it sounds like everyone is on the same page that improvements can be made, it is just a matter to making a few changes at a time to figure out what works the best.
Starting with v1.3.4 we will be using Milestones again to better track what issues we are targeting to complete for each release, with our main goal of the release being the determining factor of how much else we can do. For example with v1.3.4 the main goal will be the new sia file format, so that will be our critical path task for the release. We will add and remove other issues to the release depending on priority and if we believe we can get them done without holding up the release. I will be managing the Milestones and will do my best to keep them as up to date as possible.
I think this will help at least clarify what is being included in each release and then will hopefully work well with whichever path we decide to go down in terms of release branches.
Thanks everyone for participating in the discussion. I initially started this proposal while looking for an easy fix for (1) the documentation being ahead of the latest release, and (2) attempting to keep master branch stable. While I think RC branches, Milestones, established release criteria, etc, are all worthwhile topics, none of them address the documentation or master stability issues.
It sounds like not everyone agrees that master should be stable, (would a Sia dev care to comment on the idea of a stable master branch?) but we have all agreed that the documentation being ahead of the latest release is a problem. @tbenz9 suggested release branches as a fix. @DavidVorick suggests "Maybe we can extend our documentation practices to state what version an endpoint was most recently updated? That way people looking at the docs know whether or not they need to upgrade".
I think this release flow is worth re-visiting in a year though, when things have progressed to the next stage.
I politely disagree with this, early in a development stage is the best time to establish good workflow practices and it'll only get harder as the team grows and the code increases complexity.
The impression I get is that @DavidVorick is not likely to adopt a workflow change this year so perhaps this proposal should be closed, and a new one opened to discuss the documentation problem directly.
I wouldn't mind switching to a new workflow. Maybe something simple like.
Workflow: We would use Dev the way we use master right now. When we decide to create a RC, we merge Dev into Testing. Fixes for the RC are then pushed directly to testing until the RC is ready to be released. Once we are ready for the release we merge Testing into master and into Dev.
This should also make sure that the documentation represents the current release version on master.
I wanted to suggest release branching to improve our GitHub workflow. I was thinking a workflow like the following might be beneficial as Nebulous grows the Sia dev team.
Pros:
Cons:
Other: