Define and document our "release" process

jdhoffa commented 9 months ago

          I have a general question to @AlexAxthelm  and @cjyetman: what is our "release" process?

Initially, we would only bump r2dii.* version numbers once they were accepted to CRAN
We then entered a situation where we "released" the pacta.* packages when they were used to build a "release candidate" for Transition Monitor. This was more or less just to facilitate "easily" re-building the image based on the same package versions, to be able to simulate, e.g. re-running a COP portfolio.

As far as I can tell, this is now serving neither of the above, but instead the WebApp circumstance, so I guess this perhaps is a call to align on our definitions of a "release", otherwise I fear it ends up being a meaningless term.

NIT: I will extract this into an issue in the practices repo.

_Originally posted by @jdhoffa in https://github.com/RMI-PACTA/workflow.portfolio.parsing/pull/17#discussion_r1478434437_

jdhoffa commented 9 months ago

Relates to #6 and #7

AlexAxthelm commented 9 months ago

🤔 My inclination (for R packages, at least) would be to do a version bump on any merge into main, since that would work nicely with pak being able to pick the right/latest version and bypass a lot of tricks involving feature flags.

# DESCRIPTION
Imports: 
    pacta.portfolio.import (>= 0.0.3),
    uuid
Remotes:
    RMI-PACTA/pacta.portfolio.import

On the other hand, that's probably not super compatible with our current merge flow. Bumping versions here could get very busy, since we normally develop with features merging directly into main (rather than to a develop branch).

jdhoffa commented 9 months ago

This is another huge diversion from our existing development practices, and not something I imagine neither @cjyetman nor myself am likely to agree with without justification.

This also begs the question: does the individual maintainer have autonomy to decide how each repository manages their own git strategy and release process individually? I would be inclined to say "no" here as I think that unnecessarily introduces complexity into our team, but something we should absolutely discuss and document here: https://github.com/RMI-PACTA/practices/pull/12

However, directly to your point, I think the following options exist (not necessarily comprehensive, and focused to R packages):

Continue using the lightweight https://docs.github.com/en/get-started/using-github/github-flow that we have up until now, and rely on a separately hosted area for "releases" (for r2dii packages this has been CRAN. For pacta.* type packages I could imagine using R-universe or CRAN relatively easily).
Implement the git-flow strategy across all of our repos, and adequately support all of our developers on how to learn an implement it (@MonikaFu and @jdhoffa have had super annoying experiences with this in the past as it was implemented in https://github.com/RMI-PACTA/r2dii.plot without any support. See this blocking issue that prevented us from doing a release for 1 year here: https://github.com/RMI-PACTA/r2dii.plot/issues/489)
Some mixed bag option where each maintainer implements whatever they wish.

My strong preference here is for option 1, as it limits any unnecessary changes in existing paradigm, while offering on-par features (maybe not with what you mention around pak).

This will be discussed collectively in a DevHangout.

cjyetman commented 9 months ago

Afraid to begin this debate again, but... on the P4I side, where we have not yet released anything on CRAN, I believe the Git hash is the safest identifier of "version" we can use. See https://github.com/RMI-PACTA/pacta.data.preparation/issues/99#issuecomment-1239647545

jdhoffa commented 9 months ago

I don't think we should be afraid of this debate, I think it's crucially important that we get to the bottom of it in fact and come to some sort of agreement. No more kicking the can down the road.

jdhoffa commented 9 months ago

Afraid to begin this debate again, but... on the P4I side, where we have not yet released anything on CRAN, I believe the Git hash is the safest identifier of "version" we can use. See RMI-PACTA/pacta.data.preparation#99 (comment)

Directly to your point, I think git hashes are adequate while these packages remain internal only (sorry, I know I didn't think this in the past, my thinking on this has evolved since then). However, I imagine versioned releases (SemVer or otherwise) may be useful if we ever intend to communicate the status and changes of packages to anyone external who may be using the code not under our supervision.

In particular, strongly sticking to SemVer can help users know and understand when an update may introduce API changes that may break their downstream workflows. If we rely only on git-hashes, and the latest version on main we reduce that flexibility and require them to constantly use the "bleeding edge" version, even if it screws up their entire workflow.

I know this isn't currently the case, but it maybe worth considering in our discussion, as it could be a state we may want to aim for.

AlexAxthelm commented 9 months ago

Worth noting that most GH-based workflows (including GH-flow, and pak downloading) make an assumption that what is on the default branch (for us, main) is the latest working version. Not a dev version.

If we want to batch multiple PRs into a single "release" (however we want to specify that), then we need somewhere (another branch?) to do that, and fthen once we're happy with that, we can merge them into the default branch.

jdhoffa commented 9 months ago

Yup totally understood. The real question is do we want that place to be main or somewhere else (CRAN/ R-Universe/ somewhere else/ docker hub?)

I have my reservations only because of the current workflows of the team, and trying not to shake too much up at once but as always happy to be convinced otherwise!

jdhoffa commented 9 months ago

And just to be very clear, I definitely don't think everything needs to/ should live on CRAN, I see the immense pain in the ass that could be.

There's a specific reason that the banks packages do/ should, but I don't think we need to force everything to.

cjyetman commented 9 months ago

definitely agree, for communicating "version" to external users we should use semver, but I think that needs to be coupled with a CRAN release, or something that forces the semver to be immutable... I don't think the semver living in the DESCRIPTION file alone is adequate, at least not in our typical git flow

I also agree that a git flow that had experimental dev branches for PRs and such, a (stable-ish) dev branch for the bleeding edge, and a stable main/release branch for expected external use would be ideal, but... that's substantially different than our typical git flows, and I'm not sure it's worth rocking that boat. Additionally, I'm not aware of any strong examples of R packages that follow a flow like that, so we'd be deviating from some expected norms.

jdhoffa commented 9 months ago

I think enforcing SemVer only when paired with a legitimate release (e.g. on CRAN) is a totally reasonable expectation
I share the hesitations with git-flow, for the same reason, but not so against it that I wouldn't consider it. But also don't really feel it's any kind of priority

A question for @cjyetman does R-Universe have any mechanism for enforcing SemVer?

cjyetman commented 9 months ago

A question for @cjyetman does R-Universe have any mechanism for enforcing SemVer?

Not that I'm aware of. As I understand, it is pointed to a branch/tag/hash, and whenever a change is made there it automatically rebuilds. I suspect if a fancy git flow was used, one could enable two parallel r-universes, one for stable/main and one for dev/bleeding-edge. I believe that in any case, the reported "version" on r-universe will be whatever is stated in the DESCRIPTION file (in the associated target).

jdhoffa commented 9 months ago

Got it. So for our R packages, if we intend to deviate from CRAN AND do some kind of SemVer, we would need to keep track of it and enforce it entirely ourselves.

cjyetman commented 9 months ago

Worth noting that most GH-based workflows (including GH-flow, and pak downloading) make an assumption that what is on the default branch (for us, main) is the latest working version. Not a dev version.

a bit nit-picky, but just to be precise, pak makes a default assumption that you want the default branch of a repo, but it can be pointed at other branches/tags/PRs/hashes/etc.

cjyetman commented 9 months ago

I also agree that a git flow that had experimental dev branches for PRs and such, a (stable-ish) dev branch for the bleeding edge, and a stable main/release branch for expected external use would be ideal, but... that's substantially different than our typical git flows, and I'm not sure it's worth rocking that boat. Additionally, I'm not aware of any strong examples of R packages that follow a flow like that, so we'd be deviating from some expected norms.

also importantly in this case, I would expect a strict rule (if not a GH Action that also enforces it) that any push to dev or main would require a version bump (in DESCRIPTION)

cjyetman commented 9 months ago

Got it. So for our R packages, if we intend to deviate from CRAN AND do some kind of SemVer, we would need to keep track of it and enforce it entirely ourselves.

yes, I believe so

jdhoffa commented 9 months ago

I also agree that a git flow that had experimental dev branches for PRs and such, a (stable-ish) dev branch for the bleeding edge, and a stable main/release branch for expected external use would be ideal, but... that's substantially different than our typical git flows, and I'm not sure it's worth rocking that boat. Additionally, I'm not aware of any strong examples of R packages that follow a flow like that, so we'd be deviating from some expected norms.

also importantly in this case, I would expect a strict rule (if not a GH Action that also enforces it) that any push to dev or main would require a version bump (in DESCRIPTION)

hmm NIT: but just to be precise, I would expect a version bump on push to main, but not on push to develop? In my mental model, if we use what we do in r2dii.analysis as an example, then the switch from: GitHub flow + CRAN -> Git-Flow + GH would effectively correspond to a switch from: main as we know it now would become develop CRAN as we know it now would become main

jdhoffa commented 9 months ago

EDITED to include @cjyetman and @AlexAxthelm 's points:

Ah woops, I confused myself. We would still have to have the version bump on develop at some point, just the precise logic would differ.

On develop we would enforce that:

Version in DESCRIPTION must follow an x.y.z OR x.y.z.9000 format AND
Version in any commit must be > the version before commit

Then on main we would enforce that:

Version in DESCRIPTION must x.y.z format AND
Version in DESCRIPTION for any commit must be >= the version before commit AND
Version in DESCRIPTION for any commit must be > the version on main

That seems pretty doable in an action.

AlexAxthelm commented 9 months ago

@jdhoffa , on main, version in commit is > than version in prior commit (not >=).

The general git-flow paradigm would be something like

get develop into something feature-complete for the release (no more application changes)
open a release/0.1.0 branch
Do any housekeeping and version bumps on release (updating news, devtools::document(), etc)
merge release/* into main and develop

cjyetman commented 9 months ago

Ah woops, I confused myself. Obviously, we would still have to have the version bump on develop at some point, just the precise logic would differ.

On develop we would enforce that:

Version in DESCRIPTION must follow an x.y.z OR x.y.z.9000 format AND

Version in any commit must be >= the version prior to commit

Then on main we would enforce that:

Version in DESCRIPTION must x.y.z format AND

Version in any commit must be >= the version prior to commit

That seems pretty doable in an action?

exactly, what I was thinking... though there could be an additional check to enforce that dev version x.y.z.9000 is higher than current version on main too

jdhoffa commented 9 months ago

@jdhoffa , on main, version in commit is > than version in prior commit (not >=).

The general git-flow paradigm would be something like

get develop into something feature-complete for the release (no more application changes)

open a release/0.1.0 branch

Do any housekeeping and version bumps on release (updating news, devtools::document(), etc)

merge release/* into main and develop

Got it.

jdhoffa commented 9 months ago

As per a comment from @AlexAxthelm

[ ] Once we have discussed and agreed on this, there should be an addition also to the maintainer.md something along the lines of:

## Versioning and Release
Maintainers are responsible for communicating overall application version through some informative version identifier, as well as determining the suitibility of any release candidate and any release schedule

jdhoffa commented 9 months ago

FYI: ADO ticket to gather user feedback from Banks to assess if we can explore alternatives to CRAN: https://dev.azure.com/RMI-PACTA/2DegreesInvesting/_workitems/edit/9726

jdhoffa commented 8 months ago

Good comment from @AlexAxthelm

hash communicates about state version communicates about function

jdhoffa commented 8 months ago

We had a devHangouts discussion about this. Below is a summary of the transcript of that meeting:

Versioning and Release Strategy

We discussed the purpose of releases and versioning, and the differences of semantic versioning and calendar versioning or simply using Git hashes for identifying specific states of the code. @AlexAxthelm and @cjyetman both noted that "hashes communicate about state, version communicates about function".

Internal Communication and Consistency

The importance of clear communication within the team about versioning and repository states is stressed, with the idea that consistent practices across the team are crucial for efficiency and understanding.

SemVer for R Packages

For our "vanilla R packages", e.g. those with the pacta.* or r2dii.* prefixes, it was agreed upon that implementing a Semantic Versioning was a good approach.

Other types of code

For other software bases (e.g. Transition Monitor Dockerfile, other workflows, data) it was less clear how versioning should be done, or if it was even necessary to define it. For the Transition Monitor Dockerfile, it was stated that some consistent tagging system would be useful (e.g. tag relevant images with date-built, ch 2024, and other useful developer facing tags to help us sleuth errors)

Potential Use of CRAN

The use of CRAN (Comprehensive R Archive Network) for package release is mentioned, with a consideration of whether it is necessary for their workflows or if there could be alternative ways to distribute packages to banks or other users. Note: this discussion seemed to stray a bit from the "release" discussion.

jdhoffa commented 8 months ago

Of this, the only thing I think that deserves to go into practices is that we will use SemVer consistently for our vanilla R packages. I will follow up with a PR.

AlexAxthelm commented 8 months ago

Of this, the only thing I think that deserves to go into practices is that we will use SemVer consistently for our vanilla R packages. I will follow up with a PR.

As I noted int he meeting, I think that this should be "we will use SemVer mostly consistently". Some packages, such as pacta.scenario.data.preparation would be well served to have a calendar versioning, since those do have a strong time component in them.

using a version such as 2023.0.2 is technically SemVer compliant, but doesn't match the sprit of semver.

jdhoffa commented 8 months ago

I didn't totally understand that point in the meeting, but sounds cool and understood!

RMI-PACTA / practices