AstarVienna / irdb

A database containing instrument data for infrared telescopes
GNU General Public License v3.0
6 stars 8 forks source link

Check package versions using git tags #116

Open teutoburg opened 1 year ago

teutoburg commented 1 year ago

There used to be a test checking the version information in each package against the information in the packages.yaml file (at least for all packages present in that file). Since we deprecated the use of this file, more or less, these tests now fail when a package version is changed. In #115, I disables the test temporarily, but some form of it should probably still exist.

My idea would be to use git tags for this. There will now be a separate tag for each package released in this "Summer 2023" release, and there should be new tags for any new release. The test could then:

a. read all the existing tags for each package, pick the newest, and check if the version set in that package's version.yaml file matches; or b. read the version info in each package's version.yaml file, and check if there's a matching git tag.

Additionally, the test could verify if the tag points to a commit that actually changed the version.yaml file, to make sure we're pointing to the right thing.

@hugobuddel, thoughts?

teutoburg commented 1 year ago

In either way, I propose to standardize on the following structure for the names of the tags: pkg_name.YYYY-MM-DD, since that matches the names of the zip files on the server. So, e.g. MICADO.2023-07-11

hugobuddel commented 1 year ago

Proposal for test and for standardized tags accepted.

I've created tags for MORFEO / MICADO / METIS on a963211 , which is on dev_master, which is a bit annoying, but well, better that than to not have tags.

teutoburg commented 1 year ago

Some more weekend-thoughts on IRDB versioning and workflow. Disclaimer: The following is meant to be read as a formalized train of thought from A to B, hence the "draft", "conflict", "solution", "updated draft". Hope that makes sense...

Notation

Versioning requirements

Versioning draft

IRDB uses Calender Versioning to mark different versions for instrument packages, making it easy to keep track of changes across time. The scheme for the version number is pkg_name.YYYY-0M-0D, using the notation from CalVer.

Example workflow for a single IRDB package

Square boxes indicate a commit that sets the version number in the version.yaml file. The package is built and pushed to the server using the version in this commit. A tag is created on this commit with the package name and matching new version number.

gitGraph
   commit type: HIGHLIGHT tag: "2023-04-20"
   branch feature_1
   checkout feature_1
   commit
   commit type: HIGHLIGHT tag: "2023-07-05.dev"
   commit
   commit type: HIGHLIGHT tag: "2023-07-10.dev"
   checkout main
   merge feature_1
   commit type: HIGHLIGHT tag: "2023-07-11"
   branch feature_2
   checkout feature_2
   commit
   checkout main
   merge feature_2
   commit type: HIGHLIGHT tag: "2023-07-18"
   branch feature_3
   checkout feature_3
   commit
   commit type: HIGHLIGHT tag: "2023-07-25.dev"

Conflict

An issue occurs when e.g. the following all happens on a single day:

gitGraph
   commit type: HIGHLIGHT tag: "2023-04-20"
   branch feature_1
   checkout feature_1
   commit
   commit type: HIGHLIGHT tag: "2023-07-11.dev"
   commit
   commit type: HIGHLIGHT tag: "2023-07-11.dev??"
   checkout main
   merge feature_1
   commit type: HIGHLIGHT tag: "2023-07-11"
   branch feature_2
   checkout feature_2
   commit
   checkout main
   merge feature_2
   commit type: HIGHLIGHT tag: "2023-07-11.dev??"
   branch feature_3
   checkout feature_3
   commit
   commit type: HIGHLIGHT tag: "2023-07-11.dev??"

How should the version numbers be assigned, such that the requirements outlined above are met at any point in this process?

Possible solution

A possibility could be to append an alphabetic _MICRO to the version number, if a release with the same date already exists, regardless of dev or not. The example above could then look like this:

gitGraph
   commit type: HIGHLIGHT tag: "2023-04-20"
   branch feature_1
   checkout feature_1
   commit
   commit type: HIGHLIGHT tag: "2023-07-11.dev"
   commit
   commit type: HIGHLIGHT tag: "2023-07-11_a.dev"
   checkout main
   merge feature_1
   commit type: HIGHLIGHT tag: "2023-07-11_b"
   branch feature_2
   checkout feature_2
   commit
   checkout main
   merge feature_2
   commit type: HIGHLIGHT tag: "2023-07-11_c"
   branch feature_3
   checkout feature_3
   commit
   commit type: HIGHLIGHT tag: "2023-07-11_d.dev"

Updated versioning draft

IRDB uses Calender Versioning to mark different versions for instrument packages, making it easy to keep track of changes across time. The scheme for the version number is pkg_name.YYYY-0M-0D_MICRO, using the notation from CalVer, where the final part _MICRO is appended only if a release with the same date already exists, regardless of dev or not. This _MICRO suffix is alphabetical ([a-z]). As an example, assuming a package of the name mypackage, a version number mypackage.2023-07-11_a would be created, if a release mypackage.2023-07-11 or mypackage.2023-07-11.dev already exists. Equally, a version number mypackage.2023-07-11_a.dev would be created, if a release mypackage.2023-07-11 or mypackage.2023-07-11.dev already exists.

In other words, the version number must match the regex ^([A-Za-z]*\.\d{4}\-(0[1-9]|1[012])\-(0[1-9]|[12][0-9]|3[01]))(\_[a-z])?(\.dev)?$ and the substring between the two dots must be unique across all versions of that package.

Changeover

This is still compatible with my previous comment, so would not break what we did in the last few days.

hugobuddel commented 1 year ago

Seems like a great plan! And, wow, these mermaid graphs look cool. 1 question with some suggestions and three minor comments.

Three comments first:

Now for the question: What does the version.yaml contain for the circular commits? I suppose it is the same as the last square commit, and thus the previous tag. That old version could become a problem if someone somehow ends up using a non-tagged commit as an instrument package, and they then encounter a bug. E.g. someone checking out feature_1 when it was still at 1-48a2da3 (the first commit in feature_1, before the .dev release), or a supervisor unzipping a tarball sent by a student. Then that user could complain that version 2023-04-20 is broken, because that is the version in their version.yaml, but in reality they are not running that stable version at all, but a modified version.

There are several ways to 'fix' that outdated-version problem:

  1. Not include the version number in the source code at all, and generate the version (that is, version.yaml) automatically when publishing the package. E.g. if there is a tag, the zip file gets a version.yaml with the tag, and if there is no tag, the version will be based on the commit hash. This would keep the commit graph as above.
  2. Create the square commits each on their own branch, instead of on the main branch or on the feature branches. That way the main branch will never have an actual version number in the main or feature branches, just a placeholder like develop or so. The first commit graph below describes this option.
  3. Add a new commit immediately after the square commits, with an untagged and unreleased version number. That's kinda what I tried to do with ScopeSim etc by incrementing the patch version and adding the -alpha suffix. The second commit graph below describes this option.

Options 2 and 3 above both have exactly one commit in the repository with a specific version in version.yaml, namely those with the tags. The other commits just have a placeholder. Option 1 above has no commit with any version in it whatsoever. So it becomes more complicated to have a local directory with a version.yaml that contains an incorrect version. My preference is probably the last option, but I can be persuaded otherwise. I currently feel I would like to have the outdated-version problem addressed one way or the other though. What do you think @teutoburg, is this a problem at all, and if so, (how) should we address it?

Below are two commit graph, mainly so I could exercise using mermaid.

The commit graph of your first figure, adapted for suggestion 2. This looks horrible in mermaid, but all the 'release branches' are supposed to be dead ends, so multiple of them could be on the same height to make the figure easier to read. I also tried to put the release branches for stable versions above the line for the main branch, but that doesn't seem to be possible.

gitGraph
   commit
   branch "2023-04-20"
   commit type: HIGHLIGHT tag: "2023-04-20"
   checkout main
   branch feature_1
   checkout feature_1
   commit
   branch "2023-07-05.dev"
   commit type: HIGHLIGHT tag: "2023-07-05.dev"
   checkout feature_1
   commit
   branch "2023-07-10.dev"
   commit type: HIGHLIGHT tag: "2023-07-10.dev"
   checkout main
   merge feature_1
   branch "2023-07-11"
   commit type: HIGHLIGHT tag: "2023-07-11"
   checkout main
   branch feature_2
   checkout feature_2
   commit
   checkout main
   merge feature_2
   branch "2023-07-18"
   commit type: HIGHLIGHT tag: "2023-07-18"
   checkout main
   branch feature_3
   checkout feature_3
   commit
   branch "2023-07-25.dev"
   commit type: HIGHLIGHT tag: "2023-07-25.dev"

Commit graph for option 3. Here I used the REVERSE type to indicate that the version is reset to the placeholder. (So in fact, it does actually reverse the previous commit.)

gitGraph
   commit type: HIGHLIGHT tag: "2023-04-20"
   commit type:REVERSE
   branch feature_1
   checkout feature_1
   commit
   commit type: HIGHLIGHT tag: "2023-07-05.dev"
   commit type:REVERSE
   commit
   commit type: HIGHLIGHT tag: "2023-07-10.dev"
   commit type:REVERSE
   checkout main
   merge feature_1
   commit type: HIGHLIGHT tag: "2023-07-11"
   commit type:REVERSE
   branch feature_2
   checkout feature_2
   commit
   checkout main
   merge feature_2
   commit type: HIGHLIGHT tag: "2023-07-18"
   commit type:REVERSE
   branch feature_3
   checkout feature_3
   commit
   commit type: HIGHLIGHT tag: "2023-07-25.dev"
   commit type:REVERSE
hugobuddel commented 1 year ago

After bit more thought, I'm now in favor of option 1, never include a version in version.yaml in the repository, but generate it from the tag on release.

About complexity and automation

Whatever scheme we do, I'm assuming it will be primarily automated. As in, all three options I give are complex enough to become error-prone when done manually. But I think complexity should only matter w.r.t. whether people can understand our flow. I'd rate the clarity of my three options from easy to hard as Option 1, Option 3, Option 2, simply by looking at the figures. So that would be a plus for option 1.

About versions in the documentation

I realized there will always be other places with the version number than version.yaml. E.g. the changelog, or notes in the documentation like added in version x.y, deprecated in version x.y, etc. And just before a release, these numbers can refer to the soon-to-be-released version. So option 1 above (with no version in version.yaml in any commit), would still require a commit that puts the correct version number in place in the documentation.

My main reluctance to advocate for option 1 was that this will tie us to git: e.g. if you'd do a git archive, it would be quite hard to figure out the version of that archive. But that is just not true: the changelog should have a top entry that summarizes the changes for the upcoming version (using a placeholder for the version string), and the next entry in the changelog should list the current-stable version. So you can trivially pinpoint the version of such a (hypothetical) git archive. So that is a point in favor of option 1 for me.

What do others do?

Other projects seem to go for option 1 as well.

https://softwareengineering.stackexchange.com/questions/388584/versioning-where-exactly-in-the-code-or-repository-should-i-write-the-version-n

https://github.com/astropy/astropy/blob/main/astropy/version.py

teutoburg commented 1 year ago

Comments on your comments etc.:

Three comments first:

  • Seeing main in the figures: maybe ditching dev_master would be a great moment to ditch master as well and go for main. That might even make things less confusing. What do you think?

That was my idea as well. I don't want to get into the whole main vs. master argument (frankly I don't really care). But I think it will be a good way to emphasize that we switch to a different workflow. And if it has the side-effect of being more "modern" with main, all the better!

  • I suppose we can ditch irdb/packages.yaml completely right?

Indeed. As mentioned (I think I mentioned it, but who knows) I've been working on a revised implementation of the package upload in the background, which doesn't use that file anymore. That's basically done except for tests. But I guess there might be a few modifications needed as a result of this discussion (which is good). Although I'm reluctant to deleting packages.yaml from the server yet, because some users might still use a download function from a previous version of ScopeSim, that would break without that file. Some kind of warning would be worth it maybe...

  • My approach to dealing with multiple releases in one day was to cheat and just use days from the future 😇. We could even go beyond the 31th of a month. Your plan is better.

Invalid dates might produce an error in sim.download_packages() in scopesim>=0.6.0...

Now for the question: What does the version.yaml contain for the circular commits? I suppose it is the same as the last square commit, and thus the previous tag. That old version could become a problem if someone somehow ends up using a non-tagged commit as an instrument package, and they then encounter a bug. E.g. someone checking out feature_1 when it was still at 1-48a2da3 (the first commit in feature_1, before the .dev release), or a supervisor unzipping a tarball sent by a student. Then that user could complain that version 2023-04-20 is broken, because that is the version in their version.yaml, but in reality they are not running that stable version at all, but a modified version.

I was working under the assumption that the enduser will only use packages installed from a .zip file on the server, where the problem would not occur in that way, if the process is done (automated I should say) correctly. However, considering that sim.download_packages() explicitly and on purpose supports installing directly from GitHub (which is the test that regularly throws 403s), that assumption might be problematic. It would be nice to know how many users actually do that though. Anyway, it shouldn't matter! So on to the solutions...

There are several ways to 'fix' that outdated-version problem:

  1. Not include the version number in the source code at all, and generate the version (that is, version.yaml) automatically when publishing the package. E.g. if there is a tag, the zip file gets a version.yaml with the tag, and if there is no tag, the version will be based on the commit hash. This would keep the commit graph as above.
  2. Create the square commits each on their own branch, instead of on the main branch or on the feature branches. That way the main branch will never have an actual version number in the main or feature branches, just a placeholder like develop or so. The first commit graph below describes this option.
  3. Add a new commit immediately after the square commits, with an untagged and unreleased version number. That's kinda what I tried to do with ScopeSim etc by incrementing the patch version and adding the -alpha suffix. The second commit graph below describes this option.

When I started reading this, my first "gut feeling" made me gravitate towards option 1. Reading your next comment reassured me in that. Option 3 would be next-best for me.

I'm reluctant towards option 2, mostly because we would pollute our repo with an ever-growing number of branches. Even though those are dead-ends, if I understand correctly they would not be deleted, right? Maybe that's just personal preference, but I like keeping the number of branches low. For tags, it's a different story though.

I realized there will always be other places with the version number than version.yaml. E.g. the changelog, or notes in the documentation like added in version x.y, deprecated in version x.y, etc. And just before a release, these numbers can refer to the soon-to-be-released version. So option 1 above (with no version in version.yaml in any commit), would still require a commit that puts the correct version number in place in the documentation.

Most notably, the version number also appears in the name of the .zip file on the IRDB server. Those are used by sim.download_packages() (0.6.0 onwards) to resolve release="stable" and release="latest" (the download function being ignorant of what's going on with versions and tags in the repo). Which was a part of what motivated me to start this whole process that resulted in this discussion 🙃

But I'm confident we'll find a robust solution for e.g. version numbers in docs. That seems to be the smaller issue.

My main reluctance to advocate for option 1 was that this will tie us to git: e.g. if you'd do a git archive, it would be quite hard to figure out the version of that archive. But that is just not true: the changelog should have a top entry that summarizes the changes for the upcoming version (using a placeholder for the version string), and the next entry in the changelog should list the current-stable version. So you can trivially pinpoint the version of such a (hypothetical) git archive. So that is a point in favor of option 1 for me.

Sounds ..... good?

I also tried to put the release branches for stable versions above the line for the main branch, but that doesn't seem to be possible.

That is possible, I've done it recently while pondering about the workflow for ScopeSim (also with the release branch^^). Although it ends up looking weird sometime, mostly in how the lines split and merge. I think that while it's possible, it's not really meant to be done I guess.