I2I: Create stable release channel with slower release cadence

rcebulko commented 4 years ago

AMP Monthly Releases

Background

Today, a new version of the AMP runtime is released once a week. The release cycle involves cutting a new canary branch each Tuesday, releasing it to a subset of users in two modes (one with canary flags turned on, and one with them turned off), and then releasing the same version to production the following week, modulo cherry-picks.

In the near future, we are looking to introduce nightly builds into the mix, where each week’s canary release will be based on the previous week’s most stable nightly build. For the purpose of this document, it’s sufficient to note that the nightly build process is not going to change the weekly cadence of production releases.

For this document, lts will be used to refer to the approximately-monthly release flavor(s) so as not to tie it to any specific schedule. Flavors will be referred to as lts and lts-rc, the counterparts of prod and rc. This does not account for discussions about renaming current release channels (discussions which include renaming prod to stable) as that would confuse matters here.

Problem Statement

External partners of AMP are faced with the problem of having to ensure quality between releases and avoid regressions to their websites. As the AMP project has grown in size and user base, the need has arisen for a version-locking mechanism that has a slower cadence than the current weekly release cycle.

It’s worth drawing a contrast between a version-locking mechanism (requires action on the part of a website) and our current canary / RC release mechanism (requires opt-in by users, or served randomly to a small percentage of requests). Today, the only way for a partner to lock their website to a specific version of AMP is to pin it to a specific RTV version, which will require regular updates on their side, and causes the page to become an invalid-AMP page, which also makes it ineligible for inclusion in AMP caches, affecting preloading speed benefits.

Therefore, we need to do two things:

Create a new release channel for AMP that is updated less frequently than our normal v0.js release
Offer partners a simple way to control the AMP version used by their website regardless of AMP’s release schedule

Non-Goals

Allowing use of the new channel for AMP emails, ads, Stories, or Actions

Current release cycle

Today, AMP releases are done in phases:

[RELEASE] Compile the runtime and create four flavors with the same version with four prefixes: canary (00), rc (03), control (02), and prod (01) and write them to an RTV directory.
[PROMOTE] Push the canary and rc flavors within 24 hours to opt-in users and a small percentage of all requests
[PROMOTE] About a week later, push the prod (and control) flavors to all users.

As a result, the release cycle is subject to the following constraints:

Websites that use v0.js will get a new AMP version every week
To avoid this, a publisher may pin their website to a specific RTV version

Things to consider

AliExpress, one of AMP’s largest partners, would like to version-lock AMP to one of the last N versions. What is a good value for N? See: https://github.com/ampproject/amphtml/issues/20903
- N = 4
We’d need to ensure that a website that opts into a version-locked release receives the same version of the core runtime, extensions, and all other build targets
- Runtime already handles this.
The mechanism to create a lts release should ideally fit into our existing release mechanism (that includes the AMP_CANARY cookie, and the canary, rc, control, and prod releases)
- This may not be necessary, given that the lts release will be an exact mirror of a previous prod release. There will be no experiments or need to canary a build to a random segment of the population, so it’s possible most of these cookies could be ignored. The only exception would be opt-in, which would allow partners to test a new build on their site manually.
Vendors want stable builds, but there are cases where we'll need to change the build anyway, such as a security or privacy issue. We should make it clear that "stable builds" may change in some circumstances.
- This is true, but since the lts release would be over a month old, presumably it would be slightly more rare that a cherry-pick would be required
What is the bar for cherry picks to lts builds? (For weekly releases, the bar is a P0 issue. Consider using the same bar)
- See below
- Any cherry-pick into the weekly release that lts-rc mirrors would be included in lts-rc; cherry-picks should almost never be necessary for lts
In cases where a cherry pick is needed only for the lts version, will we create a new cherry pick just for the lts?
- It’s not clear to me if there’s any case where this should be relevant. I suppose that would have to mean there was an unnoticed P0 for over a month, that was eventually resolved in the weekly release process, but not through a cherry-pick. The release cadence should guard against most cherry-pick needs.
- https://github.com/ampproject/amphtml/issues/25417 was a cherry pick for a behavior that's existed for years
- In cases where cherry-picks are required, the process should be similar to a production cherry-pick

Potential solutions

Release Cadence

For the purposes of explaining the cadence, I’m going to adopt the convention of referring to the release period as a “month”, which is assumed to have exactly 4 “weeks”. Due to actual month/day counts this will instead equate to a rolling 4-week window, but explaining in terms of months will make the approach easier to grasp. I will refer to releases as release[month][week] where week is 1-4.

During the [PROMOTION] phase of release[N][1], identify the “best” prod release from month N-1; note that doing this at the [PROMOTION] phase of release[N][1] ensures that even the latest prod release (release[N-1][4]) has been in the wild for at least one full week.
Promote lts-rc to lts.
Assign lts-rc to the release chosen in step 1

Some effects of this cadence, in no particular order:

Actual lts cuts may be very close together (ie. Nov-w4 and Dec-w1 above) or far apart (Dec-w1 and Jan-w3)
Each lts cut will be lts-rc for a full “month” before reaching lts
When lts-rc is assigned, it will be at least one week and at most four weeks since it was promoted to the weekly prod
When lts is released, it will be at least five weeks and at most nine weeks since it was promoted to the weekly prod

LTS Release Candidate

There are two similar but subtly distinct problems third-party clients face which could be addressed by lts releases:

There is too little time between releases for sites (ex. AliExpress) to fix new issues
There is too little time between a canary being cut and its promotion to prod for sites to identify issues before they reach production

Releasing approximately monthly solves the first problem. By also providing a lts-rc, we give the third-parties a month-long window to validate the new version on their site. It would be possible to abandon the lts-rc channel entirely, and let lts operate in its place with a running delay of anywhere from 1-5 weeks.

Advantages

Four-week window for third-parties to validate releases before they are deployed to users (and RTV lock if they see a pending breakage)
Most or all P0s would be addressed long before a release reaches lts
Clean integration with existing rc/canary cookies and splits (ie. AliExpress automatically has users testing the next lts release candidate)

Disadvantages

Possibly more complexity in request rewriting, caching, etc.
More targets to cherry-pick into if bugs are found
lts can be behind prod by as much as nine weeks

Proposed choice: Provide a lts-rc, reachable with an opt-in cookie (not with any sort of automatic Mendel splits). This will allow developers of big sites plenty of time to see how their site will be affected by an upcoming release, without affecting the experience for users.

If the delay between prod and lts is a significant issue, the lts-rc window could be shrunken arbitrarily; lts-rc could serve lts except for N weeks before the next lts release. This gives us the option to decide how far in advance we want to give a release candidate for vendors on the lts channel to test.

rcebulko commented 4 years ago

Apologies for the odd formatting; converted from GDoc, need to fix it up

rcebulko commented 4 years ago

@ampproject/wg-caching What will need to happen to ensure that pages using ampproject.org/lts/v0.js are validated correctly? It should be possible for these partners to use the stable channel and still be considered valid AMP/get cache benefits

honeybadgerdontcare commented 4 years ago

Updating our validation rules to allow the new path is what is required. We can codify it as part of the ExtensionSpec.

These will still be served by AMP caches with the most recent version, correct? That is, AMP caches would rewrite the lts path to the current rtv.

rcebulko commented 4 years ago

AMP caches would rewrite its path to the "monthly" RTV, as opposed to the prod channel RTV

honeybadgerdontcare commented 4 years ago

At one point AMP was considered evergreen. The latest release was the release. Out of pure curiosity, why should AMP caches use the monthly instead of latest?

If a publisher includes ampproject.org/lts/v0.js they are explicitly stating they want an older version than the latest and AMP caches should rewrite accordingly then. Can we include a place to publish this information for AMP caches? We have caches.json for listing AMP caches. Could we update the draft to include a place to publish RTVs for these different scenarios?

Also, are we updating the best practices for caches to respect the publishers desire for the monthly as opposed to using the latest? Would it be acceptable for a publisher to use latest regardless?

rcebulko commented 4 years ago

How do we currently provide the RTVs for caches to use for prod, canary, etc?

The caches should respect the publisher's use of monthly, otherwise there would be minimal benefit to publishers like AliExpress using it since their users would end up with the prod RTV most of the time/any time they're seeing a page through the cache.

Using latest in the caches regardless of what the publisher requests would largely defeat the purpose of the monthly channel, as the sites using it would still have a majority of their traffic (ie. all traffic through caches) using the latest release, when the point of the monthly releases is to provide a slower and more controlled rate of change

rcebulko commented 4 years ago

Do all AMP caches rewrite unversioned v0.js to versioned rtv/###/v0.js etc.?

honeybadgerdontcare commented 4 years ago

I do not know what other AMP caches do, but Google uses the latest release.

rcebulko commented 4 years ago

Google "uses" the release by rewriting a request for /v0.js to a version-locked /rtv/######/v0.js (and the same for /v0/*.js extensions). So, when you visit a page that is being served through the Google cache, the src of the script tag actually has the RTV in it. This RTV is generally prod, but could vary due to AMP_CANARY cookie, experiments, etc.

I haven't checked all caches, but when looking at a page served from Bing's cache, there is no rewriting. The page just requests /v0.js, and the RTV selection occurs on our backend when the client requests it.

As a result, this means that AMP caches don't need to know anything about the RTV; the client will receive the cached page containing a <script> with /lts/v0.js as its src, and the client will request it directly, where the runtime serving logic can handle it as usual.

honeybadgerdontcare commented 4 years ago

Are you recommending caches not to use any RTV?

rcebulko commented 4 years ago

I don't think they do currently, and I don't think there's currently any proper way for them to do so. I also don't know that there's a need for them to, but if you think otherwise maybe it's worth starting a separate discussion thread.

honeybadgerdontcare commented 4 years ago

According to our best practices:

It participates in the AMP JS library release cycle and makes every effort to serve the latest version. It does not allow sites to perform version locking.

So this Intent to Implement deviates from that and seems worth discussing it's implications.

rcebulko commented 4 years ago

How does it deviate?

rcebulko commented 4 years ago

Do you mean for the CDN to serve the actual runtime itself or pages referencing it?

honeybadgerdontcare commented 4 years ago

@rcebulko For the cache to serve with the latest version (hence rewriting the URL). Moving to a monthly release instead of using latest is a change for caches. Perhaps we can elaborate more in a design review.

rcebulko commented 4 years ago

Unfortunately the next design review opportunity isn't until after Thanksgiving, so any discussion that can be reasonably had here is useful. I agree there will need to be docs/communications to make other caches aware of the additional path (probably something like in https://github.com/ampproject/amphtml/blob/master/spec/amp-cache-guidelines.md).

With that in mind, I don't know if maintaining an RTV map is what's called for. I'm not sure how caches currently handle the runtime. Do they just request the file and cache it normally, or do they have some API or source of knowledge about the exact RTV they're requesting? If it's the former, the additional mapping should happen pretty transparently. If it's the latter (which I'll need someone to weigh in on and confirm/deny), then presumably we would add the LTS RTV to whatever file or endpoint is providing the prod RTV.

Do we know what other caches do for /sp/v0.js, /mp/v0.js, and /esm/v0.js (and equivalent extension paths)? Maybe @erwinmombay can comment re: single/multi-pass.

rcebulko commented 4 years ago

@twifkak would you mind weighing in on what impact, if any, this may have vis-a-vis signed exchanges?

erwinmombay commented 4 years ago

@rcebulko for the /sp/ mp/ /esm/ paths, these are all experiment buckets which other caches are not aware of. These paths are temporary and will be gotten rid of by Q1

honeybadgerdontcare commented 4 years ago

I think the consensus is to add the LTS version numbers for both the runtime and CSS (e.g. the same number but one is a full URL) to be included in https://cdn.ampproject.org/rtv/metadata

rcebulko commented 4 years ago

@ampproject/wg-outreach When the time comes that this is done and readily available to users, do you have any recommendations with regards to documentation? My current thoughts are to include an explanation of the LTS channel on the basic markup page in the tutorial along with the AMP HTML Spec under the AMP Runtime section.

dreamofabear commented 4 years ago

+1 that choosing "best" week is probably not worth the complexity.

There is too little time between a canary being cut and its promotion to prod for sites to identify issues before they reach production

Do we have any data (or anecdotes) of QA cycles taking longer than 5 days to run?

I think we should seek to minimize "time-from-commit-to-prod" while achieving the intended goal of this feature.

rcebulko commented 4 years ago

Per design review (https://github.com/ampproject/amphtml/issues/25447), it has been determined that an RC adds additional complexity while selecting between weekly prod releases does not offer significant benefits. The LTS promotion strategy will be as follows, working off of a "month" of four weeks:

the previous month's week 0 stable release is promoted to LTS
the version promoted to this week's stable becomes the next LTS (effectively a "release candidate" without a dedicated channel)

The release tracking issue for a stable release promotion should include a checkbox indicating whether the release will eventually become LTS.

rcebulko commented 4 years ago

There is too little time between a canary being cut and its promotion to prod for sites to identify issues before they reach production

Do we have any data (or anecdotes) of QA cycles taking longer than 5 days to run?

I think we should seek to minimize "time-from-commit-to-prod" while achieving the intended goal of this feature.

I agree with this. If we promoted week 4 stable instead of week 0 stable, we'd still be doubling the QA window from one week to two. @rsimha based on your conversations with Malte and others, how significant is the need for a larger QA window? Do we have any concrete anecdotes or just a general "we want more time"?

rsimha commented 4 years ago

@rsimha based on your conversations with Malte and others, how significant is the need for a larger QA window? Do we have any concrete anecdotes or just a general "we want more time"?

Looping in @cramforce and @nainar to make a call on this. My understanding was that partners wanted to be able to start testing their websites against the next LTS release candidate immediately after bumping it to a given LTS version, and that they were willing to make the tradeoff of a slower release cadence at the cost of delayed adoption of new features.

If a 4 week QA period isn't a requirement, then yes, we can reduce the lead time for code making it to LTS by three weeks.

rcebulko commented 4 years ago

Relaying conversation with @nainar : most partners don't even do QA, they just want something more stable/changing less often. For those that do, doubling the QA window should be plenty. The promotion strategy will then be as follows.

Let LTS promotion week (every 4 weeks) be Week[N]. Week[N-1][canary] gets promoted to Week[N][production] as usual. Week[N-1][production] gets promoted to Week[N through N+3][LTS].

LTS will be behind master by a minimum of 2 and a maximum of 6 weeks (at the end of an LTS cycle).

@ampproject/wg-security-privacy do you see any security issues that may be relevant here? @ampproject/wg-viewers where is the appropriate place to document that migrations to the AMP Viewer API need a rollout window allowing for up to 6 weeks between HEAD and LTS?

honeybadgerdontcare commented 4 years ago

@molnarg fyi. I don't see any issue here. If there is a concern with a release AMP team can rollback to the previous LTS, correct?

rcebulko commented 4 years ago

@molnarg fyi. I don't see any issue here. If there is a concern with a release AMP team can rollback to the previous LTS, correct?

Yep, cherrypick/rollback procedures are the same as prod.

ericfs commented 4 years ago

The AMP Viewer API is documented here: https://github.com/ampproject/amphtml/blob/master/extensions/amp-viewer-integration/amp-doc-viewer-api.md With a high level explanation here: https://github.com/ampproject/amphtml/blob/master/extensions/amp-viewer-integration/integrating-viewer-with-amp-doc-guide.md

rcebulko commented 4 years ago

@ericfs I think it would be less useful in viewer API documentation (used more for consumers of the API) than somewhere...closer to the developers who are actually working on the viewer? Is there a contributor guide for making changes to the viewer? What is the release/rollout process like, and who controls it?

ericfs commented 4 years ago

Are you only interested in the Google Search Viewer? There are multiple AMP Viewer and some parts are shared and some are not. If you're looking for Google specific docs, I can share that internally.

rcebulko commented 4 years ago

I am not only interested in Google Search Viewer. Let me explain a little better:

With the introduction of an LTS channel, the release available through /lts/v0.js would be a version of the codebase that was created off of master anywhere from 2 weeks old (when the LTS is promoted) to 6 weeks old (when the LTS is about to be replaced by a new promote). LTS will still be considered valid AMP, and so can still be served by caches (see validator changes). This means there will be a wider spread of versions which will need to be supported by any AMP Viewer. While the AMP Viewer API is extremely stable, any API changes would need at least a six-week transitionary period. An update to both the Viewer and the AMPHTML runtime at the same time could result in it taking up to six weeks for the runtime API change to reach LTS, and so Viewer API changes must be backwards-compatible during this time.

I'd like to document this for Viewer developers, so that any major API changes or migrations are aware of the timing requirements.

dreamofabear commented 4 years ago

I think updating AMP's release schedule documentation and an FYI (as you have already done) are good enough here.

dreamofabear commented 4 years ago

Allowing use of the new channel for AMP emails, ads, Stories, or Actions

Was just going to write that email, ads and actions validator specs probably shouldn't allow this. 👍

"Stories" is not a validator spec like the others though.

rcebulko commented 4 years ago

The validator changes PR splits the runtime tagspec for AMP out from AMP4EMAIL and ACTIONS.

@ampproject/wg-stories does stories use the standard runtime+extensions or are stories not necessarily AMPs themselves? Do you foresee any issues that should be addressed here?

honeybadgerdontcare commented 4 years ago

Stories are AMP, so they would be affected by this. If we don't want to support this for Stories then that's a more involved process.

On Fri, Dec 6, 2019 at 9:56 AM Ryan Cebulko notifications@github.com wrote:

The validator changes PR https://github.com/ampproject/amphtml/pull/25897 splits the runtime tagspec for AMP out from AMP4EMAIL and ACTIONS.

@ampproject/wg-stories https://github.com/orgs/ampproject/teams/wg-stories does stories use the standard runtime+extensions or are stories not necessarily AMPs themselves? Do you foresee any issues that should be addressed here?

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/ampproject/amphtml/issues/25578?email_source=notifications&email_token=AAKHJFJR564B4Z7KNCDAVKLQXKG53A5CNFSM4JNBQWX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEGE3Y6Y#issuecomment-562674811, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKHJFP23YZN4FMALQ3M6J3QXKG53ANCNFSM4JNBQWXQ .

rcebulko commented 4 years ago

@honeybadgerdontcare The current status is less of "it shouldn't work on Stories" and more "we're not going to guarantee it works with Stories". If stories are normal AMP, they should work out of the box though. I wasn't sure if Stories were a special case like emails.

rcebulko commented 4 years ago

This was launched and posted about on the AMP Blog. Closing

ampproject / amphtml