Closed jdhoffa closed 8 months ago
Relates to #6 and #7
🤔 My inclination (for R packages, at least) would be to do a version bump on any merge into main
, since that would work nicely with pak
being able to pick the right/latest version and bypass a lot of tricks involving feature flags.
# DESCRIPTION
Imports:
pacta.portfolio.import (>= 0.0.3),
uuid
Remotes:
RMI-PACTA/pacta.portfolio.import
On the other hand, that's probably not super compatible with our current merge flow. Bumping versions here could get very busy, since we normally develop with features merging directly into main (rather than to a develop
branch).
This is another huge diversion from our existing development practices, and not something I imagine neither @cjyetman nor myself am likely to agree with without justification.
This also begs the question: does the individual maintainer have autonomy to decide how each repository manages their own git strategy and release process individually? I would be inclined to say "no" here as I think that unnecessarily introduces complexity into our team, but something we should absolutely discuss and document here: https://github.com/RMI-PACTA/practices/pull/12
However, directly to your point, I think the following options exist (not necessarily comprehensive, and focused to R packages):
r2dii
packages this has been CRAN
. For pacta.*
type packages I could imagine using R-universe
or CRAN
relatively easily). My strong preference here is for option 1, as it limits any unnecessary changes in existing paradigm, while offering on-par features (maybe not with what you mention around pak
).
This will be discussed collectively in a DevHangout.
Afraid to begin this debate again, but... on the P4I side, where we have not yet released anything on CRAN, I believe the Git hash is the safest identifier of "version" we can use. See https://github.com/RMI-PACTA/pacta.data.preparation/issues/99#issuecomment-1239647545
I don't think we should be afraid of this debate, I think it's crucially important that we get to the bottom of it in fact and come to some sort of agreement. No more kicking the can down the road.
Afraid to begin this debate again, but... on the P4I side, where we have not yet released anything on CRAN, I believe the Git hash is the safest identifier of "version" we can use. See RMI-PACTA/pacta.data.preparation#99 (comment)
Directly to your point, I think git hashes are adequate while these packages remain internal only (sorry, I know I didn't think this in the past, my thinking on this has evolved since then). However, I imagine versioned releases (SemVer or otherwise) may be useful if we ever intend to communicate the status and changes of packages to anyone external who may be using the code not under our supervision.
In particular, strongly sticking to SemVer can help users know and understand when an update may introduce API changes that may break their downstream workflows. If we rely only on git-hashes, and the latest version on main
we reduce that flexibility and require them to constantly use the "bleeding edge" version, even if it screws up their entire workflow.
I know this isn't currently the case, but it maybe worth considering in our discussion, as it could be a state we may want to aim for.
Worth noting that most GH-based workflows (including GH-flow, and pak
downloading) make an assumption that what is on the default branch (for us, main
) is the latest working version. Not a dev version.
If we want to batch multiple PRs into a single "release" (however we want to specify that), then we need somewhere (another branch?) to do that, and fthen once we're happy with that, we can merge them into the default
branch.
Yup totally understood. The real question is do we want that place to be main
or somewhere else (CRAN/ R-Universe/ somewhere else/ docker hub?)
I have my reservations only because of the current workflows of the team, and trying not to shake too much up at once but as always happy to be convinced otherwise!
And just to be very clear, I definitely don't think everything needs to/ should live on CRAN, I see the immense pain in the ass that could be.
There's a specific reason that the banks packages do/ should, but I don't think we need to force everything to.
definitely agree, for communicating "version" to external users we should use semver, but I think that needs to be coupled with a CRAN release, or something that forces the semver to be immutable... I don't think the semver living in the DESCRIPTION file alone is adequate, at least not in our typical git flow
I also agree that a git flow that had experimental dev branches for PRs and such, a (stable-ish) dev branch for the bleeding edge, and a stable main/release branch for expected external use would be ideal, but... that's substantially different than our typical git flows, and I'm not sure it's worth rocking that boat. Additionally, I'm not aware of any strong examples of R packages that follow a flow like that, so we'd be deviating from some expected norms.
CRAN
) is a totally reasonable expectationA question for @cjyetman does R-Universe have any mechanism for enforcing SemVer?
A question for @cjyetman does R-Universe have any mechanism for enforcing SemVer?
Not that I'm aware of. As I understand, it is pointed to a branch/tag/hash, and whenever a change is made there it automatically rebuilds. I suspect if a fancy git flow was used, one could enable two parallel r-universes, one for stable/main and one for dev/bleeding-edge. I believe that in any case, the reported "version" on r-universe will be whatever is stated in the DESCRIPTION file (in the associated target).
Got it. So for our R packages, if we intend to deviate from CRAN AND do some kind of SemVer, we would need to keep track of it and enforce it entirely ourselves.
Worth noting that most GH-based workflows (including GH-flow, and
pak
downloading) make an assumption that what is on the default branch (for us,main
) is the latest working version. Not a dev version.
a bit nit-picky, but just to be precise, pak
makes a default assumption that you want the default branch of a repo, but it can be pointed at other branches/tags/PRs/hashes/etc.
I also agree that a git flow that had experimental dev branches for PRs and such, a (stable-ish) dev branch for the bleeding edge, and a stable main/release branch for expected external use would be ideal, but... that's substantially different than our typical git flows, and I'm not sure it's worth rocking that boat. Additionally, I'm not aware of any strong examples of R packages that follow a flow like that, so we'd be deviating from some expected norms.
also importantly in this case, I would expect a strict rule (if not a GH Action that also enforces it) that any push to dev
or main
would require a version bump (in DESCRIPTION)
Got it. So for our R packages, if we intend to deviate from CRAN AND do some kind of SemVer, we would need to keep track of it and enforce it entirely ourselves.
yes, I believe so
I also agree that a git flow that had experimental dev branches for PRs and such, a (stable-ish) dev branch for the bleeding edge, and a stable main/release branch for expected external use would be ideal, but... that's substantially different than our typical git flows, and I'm not sure it's worth rocking that boat. Additionally, I'm not aware of any strong examples of R packages that follow a flow like that, so we'd be deviating from some expected norms.
also importantly in this case, I would expect a strict rule (if not a GH Action that also enforces it) that any push to
dev
ormain
would require a version bump (in DESCRIPTION)
hmm NIT: but just to be precise, I would expect a version bump on push to main
, but not on push to develop
?
In my mental model, if we use what we do in r2dii.analysis
as an example, then the switch from:
GitHub flow + CRAN -> Git-Flow + GH
would effectively correspond to a switch from:
main
as we know it now would become develop
CRAN as we know it now would become main
EDITED to include @cjyetman and @AlexAxthelm 's points:
Ah woops, I confused myself. We would still have to have the version bump on develop
at some point, just the precise logic would differ.
On develop
we would enforce that:
DESCRIPTION
must follow an x.y.z
OR x.y.z.9000
format
ANDcommit
must be > the version before commitThen on main
we would enforce that:
DESCRIPTION
must x.y.z
format
ANDDESCRIPTION
for any commit
must be >= the version before commit
ANDDESCRIPTION
for any commit
must be > the version on main
That seems pretty doable in an action.
@jdhoffa , on main
, version in commit is >
than version in prior commit (not >=
).
The general git-flow paradigm would be something like
develop
into something feature-complete for the release (no more application changes)release/0.1.0
branchrelease
(updating news, devtools::document()
, etc)release/*
into main
and develop
Ah woops, I confused myself. Obviously, we would still have to have the version bump on
develop
at some point, just the precise logic would differ.On
develop
we would enforce that:
- Version in
DESCRIPTION
must follow anx.y.z
ORx.y.z.9000
format AND- Version in any
commit
must be >= the version prior to commitThen on
main
we would enforce that:
- Version in
DESCRIPTION
mustx.y.z
format AND- Version in any
commit
must be >= the version prior to commitThat seems pretty doable in an action?
exactly, what I was thinking... though there could be an additional check to enforce that dev
version x.y.z.9000
is higher than current version on main
too
@jdhoffa , on
main
, version in commit is>
than version in prior commit (not>=
).The general git-flow paradigm would be something like
- get
develop
into something feature-complete for the release (no more application changes)- open a
release/0.1.0
branch- Do any housekeeping and version bumps on
release
(updating news,devtools::document()
, etc)- merge
release/*
intomain
anddevelop
Got it.
As per a comment from @AlexAxthelm
maintainer.md
something along the lines of:
## Versioning and Release
Maintainers are responsible for communicating overall application version through some informative version identifier, as well as determining the suitibility of any release candidate and any release schedule
FYI: ADO ticket to gather user feedback from Banks to assess if we can explore alternatives to CRAN: https://dev.azure.com/RMI-PACTA/2DegreesInvesting/_workitems/edit/9726
Good comment from @AlexAxthelm
hash communicates about state version communicates about function
We had a devHangouts
discussion about this. Below is a summary of the transcript of that meeting:
We discussed the purpose of releases and versioning, and the differences of semantic versioning and calendar versioning or simply using Git hashes for identifying specific states of the code. @AlexAxthelm and @cjyetman both noted that "hashes communicate about state, version communicates about function".
The importance of clear communication within the team about versioning and repository states is stressed, with the idea that consistent practices across the team are crucial for efficiency and understanding.
For our "vanilla R packages", e.g. those with the pacta.*
or r2dii.*
prefixes, it was agreed upon that implementing a Semantic Versioning was a good approach.
For other software bases (e.g. Transition Monitor Dockerfile, other workflows, data) it was less clear how versioning should be done, or if it was even necessary to define it. For the Transition Monitor Dockerfile, it was stated that some consistent tagging system would be useful (e.g. tag relevant images with date-built
, ch 2024
, and other useful developer facing tags to help us sleuth errors)
The use of CRAN (Comprehensive R Archive Network) for package release is mentioned, with a consideration of whether it is necessary for their workflows or if there could be alternative ways to distribute packages to banks or other users. Note: this discussion seemed to stray a bit from the "release" discussion.
Of this, the only thing I think that deserves to go into practices
is that we will use SemVer consistently for our vanilla R packages. I will follow up with a PR.
Of this, the only thing I think that deserves to go into
practices
is that we will use SemVer consistently for our vanilla R packages. I will follow up with a PR.
As I noted int he meeting, I think that this should be "we will use SemVer mostly consistently". Some packages, such as pacta.scenario.data.preparation
would be well served to have a calendar versioning, since those do have a strong time component in them.
using a version such as 2023.0.2
is technically SemVer compliant, but doesn't match the sprit of semver.
I didn't totally understand that point in the meeting, but sounds cool and understood!
r2dii.*
version numbers once they were accepted to CRANpacta.*
packages when they were used to build a "release candidate" for Transition Monitor. This was more or less just to facilitate "easily" re-building the image based on the same package versions, to be able to simulate, e.g. re-running a COP portfolio.As far as I can tell, this is now serving neither of the above, but instead the
WebApp
circumstance, so I guess this perhaps is a call to align on our definitions of a "release", otherwise I fear it ends up being a meaningless term.NIT: I will extract this into an issue in the
practices
repo._Originally posted by @jdhoffa in https://github.com/RMI-PACTA/workflow.portfolio.parsing/pull/17#discussion_r1478434437_