Release tagging process conventions have changed?

Since the start of the project, up until Emscripten 2.0.19 release ( #14087 ), we have followed a release tagging convention where:

1) new changes in trunk are accumulated under a "Current Trunk" section in ChangeLog.md. 2) the commit that bumps the version tag, e.g. 2.0.19 also receives the identically named git tag 2.0.19. 3) the files emscripten-version.txt and ChangeLog.md are updated in the very same commit that updates the version and is tagged.

This process has worked well to my understanding, without complaints(?), though maybe I missed some conversations while I have been away.

This process has had the effect that when one git clones a git branch and looks into emscripten-version.txt, when it reads 2.0.19, it should be read as "version 2.0.19 (that has already been released before) + then some commits that will become a part of next tag 2.0.20".

Starting from #14110 and #14115, we seem to have lost this procedure:

1) when one reads 2.0.20 in emscripten-version.txt, I presume it now means "this is not actually yet version 2.0.20, but will become it"? 2) when the emscripten-version.txt file is bumped in a commit, that commit no longer gets a tag that corresponds to this update. 3) the tags seem to now correspond with arbitrary points in time instead of the commit that actually did the bump. Why is that specific commit tagged as that version? The tag point looks somewhat arbitrary for this version:

Commits both before and after tag 2.0.27 have emscripten-version.txt read 2.0.27. That seems wrong?

4) reading the changelog at any given snapshot now gives false information about a release. E.g. if I take a snapshot at a3b27c41d5f1caf617fc8a0f9cd84412fed1b0ab, I get this ChangeLog for 2.0.26:

https://raw.githubusercontent.com/emscripten-core/emscripten/a3b27c41d5f1caf617fc8a0f9cd84412fed1b0ab/ChangeLog.md

whereas instead if I take a snapshot at c515787ea5ee0a9b1f65b10c32b427ccb573e74c, I get this ChangeLog for 2.0.26: https://raw.githubusercontent.com/emscripten-core/emscripten/c515787ea5ee0a9b1f65b10c32b427ccb573e74c/ChangeLog.md

Both of those ChangeLogs read like the "final" or "official" changelog for 2.0.26, without giving a clue that neither of them are actually the complete version 2.0.26.

Would it be possible to revert to the old convention where the ChangeLog would read "Current Trunk" for unreleased changes to not cause confusion, and that the git repo tag+ChangeLog+emscripten-version.txt are consistently and coherently updated in a single coordinated commit like it used to be?

This change was made in #14091 in order to allow for a smoother release process. I'm trying to find the original discussion that preceded this change (it could be that it was on an internal chat which is why I can't find it).

The primary benefits of this new system is that any emscripten change can be possible release candidate, which can be tagged retrospectively once all testing is done and without any other changes being make. An important aspect is that during this testing period there is no need to block further changes from landing and we can continue work as normal. If the RC fails testing we just pick another revision and re-start the test process.

Another advantage is that the git SHA that we actually ship in emsdk is the precise one that we end up tagging. With the old system that released version in emsdk never matched that tagged version, and the changelog that we shipped in the emsdk was always wrong.

Basically the new system tends to make this more accurate for emsdk users and simplifies the release process, and the old system made things a little more obvious/clear for git users.

The new system is more in line with how llvm works (where git llvm reports itself as VERSION+1 as soon as VERSION is released).

@dschuff @kripken

I thinks @sbc100 summarized it pretty well. The first property (retroactive revision selection) became more important when we started doing LTO builds, because they take a long time, and now that time doesn't block landing new changes.

When trying to improve the process, we couldn't think of a process that has the properties Sam mentioned, but also has the property that the tagged revision is the same revision that updates emscripten-version.txt and the release notes. So the tag is still the source of truth, but there's no good way (that we thought of) to get that while still having the update information in-band.

One distinction we might make is to consider 2 different kinds of git users: those that are checking out the head of the main branch (i.e. mostly developers working on emscripten), and those that want to build their own release of a particular numbered revision. IIUC @juj wears both hats, but this issue is written primarily about the latter use case.

For the "look at HEAD" use case, as you noted, the difference is that the version number reflects the next version rather than the previous version. At HEAD, there's not really a correct notion of "current" version number because of course most git revisions are not the exact same revision as a release. That means that emscripten-version.txt (or the changelog) can never be trusted (by itself) to give you the correct version unless you already know by other means that you're looking at a "released" revision. So IMO this is kind of a wash for the HEAD use case. We could make this a bit more clear by adding something to the explanatory text at the top of the changelog, along the lines of "If the revision you are looking at is not a tagged release revision, then the version number at the top has not been released yet, and there may be additional changes before its release".

For the use case of building your own toolchain package, the thing that you really need to do is find out which git revision is the correct one for a particular release version (or equivalently, to find out whether a release has been tagged or not). Previously you could look for a git tag, or look for a revision which updated emscripten-version.txt (but you could not look in emsdk, because it didn't match the tagged revision that had correct release notes). Now you can look in emsdk, or look for a git tag, but not for the revision which updates emscripten-version.txt. When you add in the fact that in order to figure out which version of Binaryen and LLVM you need to look at emsdk in any case, this is actually an improvement, because now you can get all the necessary information in one place.

This is probably not well-documented enough publicly; we haven't quite completely transitioned yet (because the LTO build isn't the default yet). It started out as a design targeted at the Chromium build infrastructure we use to build release binaries (which, while it's all public, is mostly a thing that external people don't really care about). But when it expanded in scope to cover the whole release process, I probably should have externalized the discussion. So I apologize for that. As I said though, while I think it's an improvement, it's not perfect and we can certainly change it further. I'd be interested in hearing people's feedback, especially if there are use cases we hadn't considered.

(This should be documented here: https://github.com/emscripten-core/emscripten/blob/main/docs/process.md#minor-version-updates-1xy-to-1xy1 , specifically point 4 about the next version number. But that should be expanded.)

Thanks for the explanations.

An important aspect is that during this testing period there is no need to block further changes from landing and we can continue work as normal.

I presume this refers to the Google waterfall? Does it run more extensive tests compared to the Github-integrated CI that is needed for each PR to land?

The new system is more in line with how llvm works (where git llvm reports itself as VERSION+1 as soon as VERSION is released).

I do not mind the switch from early tag to late tag, though currently it seems like in practice we have a "middle tag" (as shown by the screenshot above). I think that is what makes this look very confusing.

Basically the new system tends to make this more accurate for emsdk users and simplifies the release process

I think my gripe is that I am also making releases of Emscripten SDK to be bundled with classic and Tiny Unity, and for me this is perceived to complicate the release process.

It used to be clear to me how to make sure I release the same content under "Emscripten 2.0.21-unity" that "Emscripten 2.0.21" would contain. Now I felt like I can't rely on ChangeLog.md containing accurate information and I can't rely on emscripten-version.txt being accurate, matching version numbers is harder.

Though after thinking over this conversation I presume that the git tag point is the remaining source of truth - as long as I keep searching for those, I should be able to accurately match up. (then I got confused as to why the tags pointed to "arbitrary" points in time and not to the bump commits like they used to)

Overall, good rationales for the Google release process, thanks for sharing them. I'll try to adapt our release process for this.

Thanks for the explanations.

An important aspect is that during this testing period there is no need to block further changes from landing and we can continue work as normal.

I presume this refers to the Google waterfall? Does it run more extensive tests compared to the Github-integrated CI that is needed for each PR to land?

Yes, the waterfall builds and logs can be seen at https://ci.chromium.org/p/emscripten-releases/g/main/console. That builder is what produces the emsdk binaries. It runs both fewer and more tests than github CI. The "release builders" (left 3 columns) run the emscripten "core" and "other" test suites but not the browser tests. Failure on those tests will block the integration of new Emscripten, Binaryen, or LLVM revisions into emsdk. The "test suites" builder (right column) runs the LLVM regression tests, part of the LLVM test suite (which confusingly is not the same thing as the LLVM regression tests), and the emscripten test suite under ASan (none of which the github builders run). These tests do not block anything but we try to keep them green. The CircleCI testers of course have much nicer integration with github, so we rely on that for emscripten day-to-day development. So in general neither the emscripten-releases nor the CircleCI tests are really a substitute for the other.

The new system is more in line with how llvm works (where git llvm reports itself as VERSION+1 as soon as VERSION is released).

I do not mind the switch from early tag to late tag, though currently it seems like in practice we have a "middle tag" (as shown by the screenshot above). I think that is what makes this look very confusing.

Yes, I agree that the lag between the tagged revision and the update of the version file is a drawback of the new system.

Basically the new system tends to make this more accurate for emsdk users and simplifies the release process

I think my gripe is that I am also making releases of Emscripten SDK to be bundled with classic and Tiny Unity, and for me this is perceived to complicate the release process.

It used to be clear to me how to make sure I release the same content under "Emscripten 2.0.21-unity" that "Emscripten 2.0.21" would contain. Now I felt like I can't rely on ChangeLog.md containing accurate information and I can't rely on emscripten-version.txt being accurate, matching version numbers is harder.

Yeah; more precisely I would say that ChangeLog.md is now more accurate in the emsdk-released version, and still accurate at the tagged version, but less accurate at other versions. emscripten-version.txt is still accurate at tagged versions, and IMO just as accurate (i.e. not accurate) at other version.

Though after thinking over this conversation I presume that the git tag point is the remaining source of truth - as long as I keep searching for those, I should be able to accurately match up. (then I got confused as to why the tags pointed to "arbitrary" points in time and not to the bump commits like they used to)

Overall, good rationales for the Google release process, thanks for sharing them. I'll try to adapt our release process for this.

Yes, the git tag point is the source of truth, and it should always match what you get if you fetch the emscripten-releases revision from the emsdk's emscripten-releases.txt JSON file and look up the DEPS at that version. We have a utility (https://chromium.googlesource.com/emscripten-releases/+/refs/heads/main/src/release-info.py) for fetching useful information about different revisions of emscripten-releases/emsdk and the various tools. It's currently in the emscripten-releases repo because some of the modes rely having local git checkouts of the constituent repos (e.g. finding out which tagged release is the first one that contains a particular emscripten or LLVM revision). But it can do things like show the emscripten, binaryen, and LLVM revision for a particular tagged release, without local git repos, which I think is probably what you want. It's still a bit rought and not super user-friendly but I'm happy to take suggestions for improvements.

I think the key take away is that if you use git tags you will get accurate changelog and accurate emscripten-version.txt and the contents will match emsdk releases exactly.

This was not true before, since the tags were made later after and the contents of the changelog were different to the official emsdk releases.

So for those using tags I think the new system is strictly better.

emscripten-core / emscripten

Release tagging process conventions have changed? #14929