Closed steven-murray closed 3 years ago
I had a bit thoughts. While one can argue on both side that increased automation can lead to more or less freedom, I, in general, prefer maximizing it...
Therefore, I vote for setuptools_scm
and bumpversion
. I'm not sure if we want to use python-semantic-release
as for CHANGELOG
, I do prefer updating it manually as git commit
often contain unnecessary contents in particular for new users and we probably don't want them to update CHANGELOG
as this is viewed by other users.
Perhaps there is a way to cherry pick content in git log
when updating the CHANGELOG
using python-semantic-release
, but then it might just be like copying paste to CHANGELOG
. There is anther reason that I prefer CHANGELOG to be written manually preferably by admin who approve the PR. I don't think it worth outlining too many details in it as git log
should already serve this purpose and CHANGELOG
should be concise. Then it is the same for PyPI
and releasing, I believe this is the admin's responsibility rather than usual developer.
I think perhaps having a master_dev
for developers is a better option?
Both @BradGreig and @qyx268 agree that using essentially the git-flow branching model, in which version updates are done specifically through a dedicated PR is the best idea. Probably using setuptools_scm
is better, just because it gives a more explicit version. Trying to use bump2version
is a good idea if possible.
As for CHANGELOG, we should update it manually. Which means we should always encourage devs to update the changelog in their PRs. Should be on a checklist so we remember.
Someone should check out commitizen in practice to see if it would be too much overhead. If it's not too much overhead, it will ave two benefits: 1) nice commits, 2) the ability to semi-automatically set the version update.
I will try to write up a specific "manual" for how this should work, and see if everyone agrees on details.
So, here's how I think it would look, based on this discussion.
Firstly, we create a develop
branch, which is protected in the sense that we can't push directly to it, and all features must be made in PRs that must pass tests/coverage/other checks before merging to develop. Note that API-breaking features must NOT be merged to develop unless it has been decided that the next version is to be a major version update. They can sit and wait as PRs until such time as they are ready. To make a new minor release would then look like this:
release/vX.Y
branch off of develop.develop
at this point, until the release branch is merged. release/vX.Y
branch. bumpversion minor
, which should update the CHANGELOG version, and also make a git tag, which updates the package version via setuptools_scm
. Any bugfixes that are urgent can be made in fix/fix-name
branches, and merged directly to master. I can probably set up a Github Action that automatically updates the version number in this case, and deploys (since we know the version to update is the patch version).
Somehow, we then also have to merge master
back into develop
. This is hard because develop
is protected. It would be nice if it could happen automatically -- I'll look into it.
Note that since the versioning happens via git tag, if anything goes wrong and we have to manually update the version on master, we can. This will also auto-trigger deployment.
Just a note that using bumpversion
is really out of the question, since it requires updating text in the setup.cfg at the very least. This means we have to do it before merging, which means the git tag would point to the pre-merged commit, rather than the merged one.
There's a few things that don't quite work in the above scheme:
git push --tags
. This is annoying, but I can't find a way around it. There's no way for Github Actions to know where the merge is coming from (which is crazy!) so it can't figure out what kind of bump to do based on the branch name. I can't think of any other information that it could use, other than the commit messages, which we already decided weren't stable enough to let it happen automatically from them. So, manual it is.One more consideration: if the "main" branch we work off is dev
, we should probably make that the "default branch". Another option is to use master
as our default branch, acting like dev
, and make a production
branch or something like that, which would act like master.
In this case, let's do master + production branch then.
This issue is here to help us discuss and decide (at least initially) on a way to manage version releases of the code, and, very much related, how to setup our branching.
Firstly, the only thing I'm really firm on is that we should be following semantic versioning. In brief, this means that any released version should be called
vX.Y.Z
, whereX
is a major version and is updated if any of the changes introduced are breaking changes.Y
is a minor version which is updated if any new (non-breaking) features are introduced (either physical or code features), andZ
is a patch version which is updated if a bug fix is introduced.One thing to keep in mind is that whatever we decide on we can change in the future, if we find it's not suiting us (except for the semantic versioning!)
There are a few different kinds of thing we need to decide on, and they all kind of affect each other. Let me lay out some of these options for different categories, and then I'll try to present a few wholistic plans that we can perhaps vote on.
Components To Decide On
Version Storage Location
Currently, the version itself is saved as a string in the main
__init__.py
as__version__
, and is manually updated whenever a new version is created. Most places that need to refer to the version can programmatically access that version, so that there's only one number to update, with two exceptions: (1) the CHANGELOG, which also has to be manually updated, and (2) thegit
tag for the version.Another alternative is to use
setuptools_scm
, which enables setting the version based on the latestgit
tag. That way, the definitive version of the code is in the git tag itself, and the__version__
in__init__
just knows how to access that. This takes away one of the exceptions from the current method. This method also has some other benefits -- the version can be extended to contain information about how far the current code is from that version exactly, and can contain the current git hash. This aids reproducibility.To be succinct, the two options here are:
__init__.py
setuptools_scm
.Version Bumping Tool
As I already said, currently we manually change the string version in
__init__.py
to change the official version. This has a number of pros/cons, eg. it can be easy to get the version "wrong" (update the wrong number). But it's also more flexible, in that if we decide that a certain version is to be set, we can do it directly. Since the update is in the codebase itself, it also means that a whole PR must be made, reviewed, and merged before the version is updated. This could be seen as a pro or a con!Some alternatives here are:
setuptools_scm
)__init__.py
and/or the git tag (should also be able to configure it updating the changelog, I think). Note that this is still manual in the sense that we have to runbumpversion minor
or something like that (and have to decide on which semantic part to update).__init__
kind of version location, but it has the advantage of deciding for you which part of the version to update, and updating your changelog etc.Another thing to decide in this category (but this is highly related to what kind of branching structure we adopt, see below), is "how automatic" we want the version updates to be. It is possible to make them essentially fully automatic, by using Github Actions and
setuptools_scm
. Let me list some options for automatic-ness:python-semantic-release
inside a GitHub action to automatically create a new PR that updates the relevant version digit, and updates the CHANGELOG etc. based on the commit history (see caveats below).setuptools_scm
is enough to create the new version (and can trigger PyPI deploy). Cons here (other than the caveat of relying on the commit history) is that it won't update the CHANGELOG, so all the current changes will still be under thedev
heading. Maybe this is OK, and we just update the CHANGELOG on the next commit, but it's still a bit annoying. Also, unless we specify a kind of interesting branching setup, this will happen on EVERY PR to master, which means the version will bump on every PR. This may or may not be desirable.Some notes on the commit history: the only real way to automatic version bumps (i.e. to know which digit to bump) is to go off the commit history. This requires everyone using a specific commit message template. I am actually very much in favour of this in general -- I think using a specific convention for commits makes searching and reading the commit history nicer as well. A popular convention is the AngularJS convention. The convention can be somewhat enforced on all developers by using commitizen, but I've never used it. The cons of this approach that I can see are (1) a little extra overhead for development and (2) if the conventions are not obeyed properly, then an automatic update will be both wrong and we may not be able to intervene.
Deployment Method (to PyPI)
Currently, the method for deployment is that I manually run
python setup.py sdist; twine upload
on my own computer whenever the version is manually bumped. It's easy to forget to do this.It's fairly easy to setup up automatic deployment to PyPI using Github Actions, however the trick is deciding when to deploy and how to trigger that. This will depend a lot on our branching and choices for automation above. Some options:
setuptools_scm
, though perhaps we could get it to work withbump2version
. I guess it could work with any system so long as we also do agit tag
after a version update. This would remove the second con of (1), but not the first. Also adds the ability to have the git version be inconsistent with the actual version in the__init__.py
if not usingsetuptools_scm
or some other automatic version tool.vX.Y.Z
, orrelease-XXX
). That would mean we'd have to stick to a certain kind of branching scheme, but I think everything else could be automated.Note that usually deployment to
conda-forge
is automatic based on deployment to PyPI.Changelog Update
We keep a CHANGELOG to let users know what's changed in each version. Currently, this is manually managed, and we try to make developers remember to update the changelog on every PR, adding what's changed in that PR under the
dev
heading. Then, when it comes time to update the version, thatdev
gets changed to the version number, and a new emptydev
section is added. Note that since doing this means changing actual files in the repo, it must be done in a PR (either a PR dedicated to cutting a release, or just in the final PR that's going into the release). I don't see any real way around this, which is why the "fully-automated" idea of version bumping presented above doesn't quite work.There are however some options for compiling the changelog automatically (still has to be checked in a PR) from the commit history. For example,
python-semantic-release
does this for you. This would mean that developers don't have to remember to update the changelog in each PR, they just have to write decent commit messages. Either way, I guess we get to have final say on the output before merging in the PR.Release Timing Strategy
A lot of what we decide will depend on how we want to time releases. I can think of three options:
Branching
I've already referred many times to branching, because how we manage branches is going to have a big impact on our other choices. There are two big streams of thought for branching (and lots in between): continuous-delivery (or the "GitHub Flow" model) or the git-flow model. The first is much simpler (and essentially what we're doing now), while the second is more complicated (perhaps too complicated for us). However, while the first is really useful for web-development, where no other libraries are depending on yours, and you just want to get changes out and visible as quickly as possible, the second is really useful for libraries on which other code depends. We could argue that
21cmFAST
is somewhere between -- it has library-like components, and certainly other people's code will rely on it, but it's really used more as an application.There are a few considerations for branching. First, there is the possibility to "protect" branches in GitHub, which means that code will never be merged into that branch without passing tests and other checks (possibly reviews). The protected branches can be arbitrary (and can be set based on patterns, so eg.
vX.Y.Z
could be protected). Currently, we protect master like this. This makes master essentially "always correct". For some of the above choices of automatic deployment etc., we could also make master always correspond to the latest deployed version on PyPI. This may or may not be useful (pros are that if someone installs directly from github, they're gauranteed to have a "release", not something in-between. But if someone is doing that, it's likely that they might be looking for something slightly more bleeding-edge).Secondly, on GitHub, merging branches is an opportunity to insert a GitHub Action that can "do stuff", like deploying, bumping versions, etc. We can try use that to our advantage.
Thirdly, the reason to use branches at all is to develop new code in isolation, and have the ability to (1) discard the new changes before it affects anything else, and (2) control the timing of the inclusion of that code into the "official" branch. Some interesting scenarios pop up in this regard. For example, if one has finished work on a "feature" (say it's not API breaking, so inclusion should essentially bump the
minor
version), and merges it into "master" (but doesn't actually bump the version and deploy yet -- so users don't get the change, since we're waiting for a few new features to be added before bumping), but then we notice a bug that we really should fix straight away, then we can't merge that to master and deploy without also deploying the feature. So instead of a nice little patch update, we're getting a full minor update, before we possibly intended to. This is compounded in the case that someone is working on an API-breaking update.Clearly, the choice of branching is going to depend on the choice of release timing strategy a lot.
Here's a few options:
master
, and let all features be developed branches that are merged directly tomaster
. Pros: simple, no change from current method, all updates are quick to deploy. Cons: no way to manage releases (i.e., can't do 2-4 of the release timing strategy options).master
anddevelop
. All feature branches are branched offdevelop
, and merged back there directly. As many PRs as we want in a version can be merged, and then avX.Y.Z
branch (or something similar) is created off of develop, which is purely for doing last-minute stuff before release (eg. bumping the version, fixing the changelog). This is merged into master, and deploy actions are automatic off of that branch. Bugfixes are branched off master directly, and merged back into there. Pros: not too complex, most automation is enabled, can combine minor feature updates into one release. Cons: no way to isolatemajor
updates (i.e. once a major update is merged todevelop
, the next release must be amajor
update, barring patches).master
and any branch namedvX.Y
. At any time, there may be multiplevX.Y
branches (eg.v3.1
,v3.2
andv4.0
). Any feature is set to merge to which version it is milestoned to be merged to. Later versions constantly rebase onto earlier versions, and must be merged after earlier versions (of course). Patches can be directly merged to master. Pros: not too complex, most automation is enabled, enables blocking on major versions. Cons: may be a little harder to maintain than thegit-flow
strategy, since there's not a central location to rebase on (i.e.develop
) for feature branches. Also, since the version branches are protected, one can't directly make the final bump-version kinds of changes to the branch (you'd have to create another PR that does that, to be merged intovX.Y
which then gets merged tomaster
). Noting that each merge requires testing (which takes a while), this might not be desirable.vX.Y
PR to do the final version-bumping stuff. Pros: really simple, automation enabled by matching to the branch name, can accumulate changes before release. Cons: can't get around fact that once aminor
update is in, hotfixes can't be released without increasing theminor
version. Thus it will result in larger version numbers OR slower bugfix updates.Note that in all approaches in which one creates a specifically-named release branch, full automation of updating the version number is not possible -- at least, I don't know of a code that will automatically create a new branch/PR based on the automatically-detected next number. I guess it could still be done, especially with the new GitHub API. One could still use something like
python-semantic-release
to guess the next version number, and then manually make that branch.I think I'll leave the issue there for now, and let people discuss this at a high level. Soon, I'll put forward some particular scenarios combining each of these options. Especially looking for input from @andreimesinger, @qyx268 and @BradGreig.