[Feature Request] Incremental rollouts

dio-gh commented 3 years ago

Figured this would be the best place to discuss this, to enable it being documented for those that do not visit the Discord server.

Yesterday on Discord the topic of regressions and stable builds came up once again. It's a common occurence and has always been, even before I joined the project years ago.

Current state of affairs:

Currently, the release strategy is what could be best described as YOLO: changes are PR'd in, and if they pass code review (which is frequently lacking/omitted), pass CI (that lacks automated testing), and no users report issues under the PR thread (rare), stuff's merged into master and rolled out immediately to everyone.

It is not surprising then that regressions crop up rather often; so severe sometimes, that the Discord bot has facilities for marking builds as broken to disadvise users who visit #build-updates from updating - and @AniLeo is also notified, to make the website API stop advertising the new build to the updater (or at least, this is how I know).

Commonly advised solution:

The common advice is that the project should switch to holding back releases; namely by separating them into stable and early channels, allowing users to control the experience they'll likely have. This is in line with most desktop software, as well as other emulators, like PCSX2 or Dolphin.

This approach is not without significant pitfalls, however. A common issue with a release strategy like this is overzealously holding back a new stable release. PCSX2 has just managed to fight its way out of this very problem, in fact. Significant changes made "upstream" (on the early channel) can also cause major conflicts. If there's a breakthrough change that either really improves compatibility, performance, UX, or all of these, everyone will probably want to receive that as quickly as possible, users and staff alike. Enabling such exceptions is a major management weight, and the omission of it can result in a significant number of people just automatically relying on early builds, making the whole exercise useless.

Another, perhaps even more significant, pitfall however is a rather obvious one: in order to be able to separate changes like this, one has to determine if the proposed stable build is actually stable. Frankly, this is not possible. We have a grand total of zero automated tests (which could change ofc), and a game library of several thousand titles (which won't, however). This is only compensated by the time the to-be-merged-to-stable changes would spend on early, and the number of people testing that - but this implies having to maintain a time difference that allows for that, and also actually fixing the issues that come up in that period, in a timely fashion no less, to avoid the situation laid out in the previous paragraph.

Lastly, it could also make for a worse PR-image. Currently, users see new updates almost every day; while obviously these are almost always incremental changes, it does clarify it to everyone that the emulator is actively maintained and worked upon, in lieu of new progress reports, blogposts, and videos. Taking this away, and giving users instead a stable that updates sometimes every x week I'd not expect to bode well in the long term.

My advised solution:

My advised solution is to extend the current build blacklisting strategy with incremental rollouts and build whitelisting. My approach would take two steps:

on RPCS3's side, generating a random token after the first launch that'd be sent with the update info request on every launch, allowing for determining unique clients;
implementing randomness on the updater API's side, that alters the chance the API will respond with a "yes there's a new build" to a request, from a unique client. The client should be able to flag that it wants exclusion from this randomness. Whitelisted builds should begin to roll out in 100% capacity right away. This feature would be for non-functional changes, like asset changes, typo fixes, readme updates, etc.. (not for fixes, not even hotfixes!!)

The flow would work like such:

on the updater API side:
- new build arrives
- gets advertised with e.g. 20% chance for some time duration,
- rinse repeat with increasing chance stages and different durations
- after a certain time, say 24h, it either reaches 100%, or a new build already arrives
- if a new build arrives, the current build becomes advertised to everyone, and the new build starts incrementally rolling out instead (or if it's a whitelisted build, immediately)
- if a build is severely broken, blacklisting can be engaged as usual
- on RPCS3's side:
- two strategies should be now available, incremental (new default) or immediate
- it'd update normally otherwise, blind to the ratios and time delays.

What would this result in, then? This strategy makes a compromise; it'll prevent broken builds being distributed to a certain percentage of users only. What this enables, is us to learn about the issues first, and come up with either workarounds, or blacklist a build semi-preemptively. This will protect most users from the most severe of regressions and breakages, and also shield the staff from a flood of complaints and bad publicity stemming from frequent broken builds. It also makes RPCS3 more easy to recommend, without one having to fear that the build at the time of download will be a broken unstable mess.

This strategy also comes with only a mild implementation cost, and a very mild increase in management weight. Issues that are specific to a new build will be both easier and harder to track:

easier, because you will have a better idea which PR might have caused a given issue;
harder, because an entire build being broken for possibly everyone will not cause the aforementioned flood of complaints.

However, this strategy also keeps every benefit we have with the YOLO approach - users will still see a quick rate of updates, and we'll be able to release more important updates to people on a relatively short notice still.

Note that the main download page should only advertise the last 100% rolled out build.

Overall, I'd argue this would be a reasonable change in the release strategy, and would be pretty much a net benefit. It'd also enable us to more accurately gauge the number of active users, as a bonus.

What do you guys think?

Megamouse commented 3 years ago

While this seems like a good idea for the users, it kind of makes it harder to respond to regressions. The more users test a build, the better. If only 20% test it, and the regression pops up when everyone gets the build, then nothing was gained. Meanwhile you'll have people ask why they can't download the build yet when we tell them that it's already merged. idk Keep in mind that this Emulator is far from having any sort of stable release. (what are we now? v0.0.0.0.0.12 ?) And most of the time we want to fix stuff as fast as possible for everyone (and i don't mean regressions)

dio-gh commented 3 years ago

That's true, but that's also an unsolveable catch 22 that will never change. This is why I'd prefer incremental rollouts to gated releases; it shifts the focus from trying to eliminate breakages completely (unrealistic, and even unfeasible in our case), to shielding the most severe issues from most people. If an issue is really severe, it's more likely to be reported; if an issue affects everyone, even in a reduced audience, the trend will still emerge.

It scales with issue severity and retains update velocity. This is why I'd vastly prefer it to the conventional stable/early strategy and say that it would be a worthwhile effort to implement even right now.

And most of the time we want to fix stuff as fast as possible for everyone

This thought process is what actually causes massive stalls in the release of stable builds usually. Wanting to fix everything, under a time limit is just not sustainable or scaleable; you'll either end up burning out, or putting the next stable release off again and again. It also puts you in a bind when it comes to deciding between working on new features vs. addressing issues.

Incremental rollouts would enable you to stick around for a couple hours after your PR is merged, see the reactions, then if it messed things up, have the bad build blacklisted and a fixed build rolled out, before it even reached the majority of users. It also wouldn't mess with the stake / pressure that's put on the staff for fixing things. If a regression is rolled out it's rolled out - this strategy change isn't meant to provide a complete harness, it's meant to facilitate the easier and more conscious tracking of bugs and regressions, and curbing sudden bricks and silly mistakes.

Meanwhile you'll have people ask why they can't download the build yet when we tell them that it's already merged.

Good point I 'spose, but it's easily solveable with some GUI memery, like making a force update button, that does an immediate update, but doesn't change the updater preset (so that it stays on incremental).

Keep in mind that this Emulator is far from having any sort of stable release. (what are we now? v0.0.0.0.0.12 ?)

The "we're in alpha" adage is a very convincing argument (excuse?) and works for a long time if not indefinitely, but if the emu is indeed a long term project (it is), then what ends up mattering is the process, not the product. This change in release strategy requires relatively minor effort to implement, and would provide a long overdue harness for user experience. RPCS3 has been under high profile attention in the emulation community since 2017, which was 4 years ago - and our status is still alpha. If we're in this for the long haul, we should play like we're in this for the long haul.

Nekotekina commented 3 years ago

Latest blacklisted PR had been opened and available for testing for ~10 days. And fixed in few hours after merge. No idea what's all the fuss is about.

My proposal would be a GUI feature which lists all releases starting from most recent, some information (like the commit descriptions) and the ability to downgrade manually. This also simplifies bisecting issues a little.

Forgot to mention, tracking or discriminating end users in any way is absolute no-no. Doesn't matter how good the intention might be.

AniLeo commented 3 years ago

I don't think allowing users to downgrade from within the emulator is a good idea in the slightest. We want to incentivize updating, not the opposite. You'd end up with more "tutorials" telling people to get a certain build there and people would end up doing it without even knowing why, and thus more people appearing with ancient builds. If users want to downgrade, they either know how to, or don't and the UI won't help them realize that either way, as they may not even associate the issue being from a newer build in the first place. I also don't agree with randomizing updates nor with having to deal with keeping unique identifiers for that purpose.

The solution here, I believe, has to be done from our side:

Auto-testing certain basic tasks, such as opening the emulator, testing firmware installation, testing package installation, booting a game and seeing if there's any output at all, to name a few, to avoid the most breaking issues. We should be able to use software rendering on the CIs thanks to lavapipe too to see if a game displays anything when booted.
Better reviewing of PRs, including not merging them immediately, but rather leaving an approval status and wait to see if anyone else comments and/or tests them, and if not then merge after a couple of days, depending on what the PR is about.
Another solution would be to have two update channels: an "unstable" update channel where select open PRs go to for testing and then the regular update channel with already merged PRs. We'd have enough interested users on open PRs to iron out all the big breaking issues.

Nekotekina commented 3 years ago

I don't see any way that prevents people from downgrade or use custom builds, be it from tutorial or not. People are free to use any build they wish, literally no need policing them.

However, being able to downgrade to previous installed version in "literally one click" if something goes wrong may actually make people fear less to enable autoupdates and test latest versions (just need to keep the history of updates). More advanced UI could assist bisecting a lot without manually downloading builds or using 3rd party scripts. And finally, if someone uses automatic downgrade, it can send short anonymous report to the website (simply two versions, current and previous). Perhaps it could be analyzed, if downgrade happens shortly after the release.

Of course, as a form of a telemetry, it should be opt-in or opt-out, but this is QoI question.

HerrHulaHoop commented 3 years ago

For the automatic downgrade bit, we could add a separate updater exe that does the same function as the website and Discord #build-updates channel: Show the complete list of versions available with links to download and changelog with the commit message or PR description. Baking it into the emulator might lead to some headaches with the emulator overwriting new data with outdated ones.

dio-gh commented 3 years ago

@Nekotekina

Latest blacklisted PR had been opened and available for testing for ~10 days.

This is exactly what I was referring to here:

and no users report issues under the PR thread (rare)

People hopping in and testing is just not reliable enough.

And fixed in few hours after merge. No idea what's all the fuss is about.

This year in data so far:

Click to expand

| Date | Number of consecutive broken builds | | --- | --- | | 2021. 04. 09. | 3 | | 2021. 03. 14. | 1 | | 2021. 03. 10. | 1 | | 2021. 03. 09. | 1 | | 2021. 02. 23. | 8 | | 2021. 02. 22. | 1 | | 2021. 02. 15. | 1 | | 2021. 02. 14. | 1 | | 2021. 02. 09. | 2 | | 2021. 02. 06. | 1 | | 2021. 01. 29. | 1 | | 2021. 01. 28. | 3 | | 2021. 01. 14. | 4 | | 2021. 01. 12. | 5 |

There isn't a month without a handful of outright broken builds, affecting all users, new or present in the timeframe of the breakage. This is really not a statistic to be proud of, especially if there's a way to avoid it completely. And that would be incremental rollouts, is what I argue.

Forgot to mention, tracking or discriminating end users in any way is absolute no-no.

I'm not really sure what you mean by discriminating? As I outlined in reply to mega's comment, there could be an option to force the update process, bypassing the randomness.

As for tracking, I'll go ahead and make it clear that I'm absolutely pro-telemetry when it comes to open source projects, and especially open data. Knowing exactly how many active users the emulator has could be absolutely indispensable when it comes to decisionmaking and measuring the impact of breakages; and there's no reason not to share this data back to the public. It's also not feasible to have number of active users type tracking opt-in/opt-out, since that defeats the purpose.

This being said, there's a way to drop the tracking from the equation. If the randomness was instead implemented on clientside, not serverside, the issue would be no more. On the flipside, stages would have to be hardcoded in, which may or may not be an issue.

My proposal would be a GUI feature

I do like this idea though, has been thrown around years back too, if I remember correctly. Hell, this could be even automated, based on issue tickets; as in, marking a game to use version x instead, and so the emu would grab that, and launch it with no-gui, making it seamless. This might be overthinking it though.

@AniLeo

We want to incentivize updating, not the opposite.

??? Having a feature that allows you to pin a build, especially if automated, would in no way incentivize not updating. If anything, it'd incentivize it more: you could keep updating RPCS3 while being able to trust the fact that you can easily fall back to using a previous one per-game, without any extra gymnastics like launching a different executable, messing with your shortcuts, etc.

The argument "If users want to downgrade, they either know how to, or don't" is in no way weighs more than users knowing if they should keep pinning to an old build or just let the game use the same version as every other game they have.

Auto-testing certain basic tasks Better reviewing of PRs

I don't want to be blunt, but good luck. Implementing incremental rollouts would be a one off, hour long business. Conjuring up sturdy enough automated testing and magically doing a better job at reviewing pull requests is a possibly never ending slog, and is basically the problem at hand. Pretty sure I addressed this above. Don't tell me this issue ticket having just been opened is what would suddenly incentivize such transformative changes in workflow; the problem itself and it's causes are known ages ago.

It's also not like it's mutually exclusive though, mind you.

an "unstable" update channel where select open PRs go to for testing and then the regular update channel with already merged PRs

Which incurs a notable management effort. Are you going to pick and choose PRs for this? On what basis? If not you, then who? What happens if you go unavailable (also an issue at present, btw)? This is a bit like mixing together the concept of stable/early builds with a watered down A/B testing, but without actual feature separation, randomization and "guaranteed" stability.

@HerrHulaHoop

we could add a separate updater exe

Please no.

HerrHulaHoop commented 3 years ago

Since you've already dug up the data, could you please add the number of days it took to fix the regression. That would really help provide context to the amount of exposure these regressions had.

dio-gh commented 3 years ago

By the next day at most, since I mentioned the number of consecutive broken builds per-day.

And it really doesn't help contextualizing the exposure, precisely because we do not collect telemetry that was mentioned right above.

HerrHulaHoop commented 3 years ago

Hmm that's what I wanted to know. 8 broken builds ain't an issue if it was in a span of 6 hours but 2 broken builds are an issue if it takes 2-3 days to fix it. So the chance of users downloading broken builds through the updater and being stuck on it is what matter.

If our devs are always prompt in fixing regressions, then as Neko said, we're overthinking this. User who accidentally downloads a broken build will have to wait at most a day to get it fixed.

dio-gh commented 3 years ago

Again, you do not have data on how many people actually received dud builds, and how many of those were first time users that do not know better. Timeframes and number of broken builds is barely enough for this guesstimation.

But of course, if you're unconvinced that this is a problem with all this at hand, despite the low amount of probable effort required to curb it, not much more do I have to convince you.

dio-gh commented 3 years ago

48 hours after the initial post and 24 hours after the last reply, I consider this request ticket to be Rejected by the team, based on the responses received so far. Should the team claim otherwise / change its mind and decide to implement incremental rollouts, this ticket should be reopened. Otherwise, different changes implemented or theorized in relation to the issue the incremental rollouts would have aimed to solve should not cause the reopening of this ticket.

Thank you for your responses.

RPCS3 / rpcs3

[Feature Request] Incremental rollouts #10092