NOAA-EMC / UPP

Other
36 stars 98 forks source link

switch to using X.Y.Z as a version number, instead of named tags #976

Closed edwardhartnett closed 2 months ago

edwardhartnett commented 3 months ago

We are using named tags for releases instead of a version number. The original concept seems to be that one could check out everything with the same tag and that's the release. For example, the current UPP release is "Release UPP-SRW-v2.2.0".

Instead, as all other software projects do, we should release X.Y.Z versions of the software. For a very detailed explanation see: https://semver.org/.

At NOAA, we do not install everything from the tagged release name, as was apparently intended when this scheme was devised. We have 100+ packages that have to be installed, and all of them follow the X.Y.Z versioning (except UPP and a few other NOAA projects). So this system of using the same tag name everywhere is not used. It simply is inadequate as a way of releasing software. It does not help at all in the installation and distribution of UPP, or SWR, or any NOAA application.

UPP needs to switch to using X.Y.Z version numbers. The next release, should be called "v2.0.0" (or some other starting version number). Future releases of the SRW, instead of getting a tag specially named for them, will use one of the versioned releases. (So, for example, SWR-2.3.0 would require UPP-2.0.0 in their cmake file, and that would be encoded in their spack file, and that's how spack installs packages in NOAA machines. SWR-2.4.0 might require UPP-2.1.0, etc.)

This also means that backward compatibility must be maintained. That is, if we release upp-2.0.0 and it includes some subroutine, then we remove that subroutine for the upp-2.1.0 release, it may cause problems. If we change how that subroutine operates (other than bugfixes), we may cause problems. The implication is that upp-2.1.0 provides everything that upp-2.0.0 provides, plus perhaps some additional subroutines and bugfixes. Just like other software packages. This is so that every NOAA application does not require its own install of a UPP. Just like netCDF, all apps should be able to use the same installed version of UPP on WCOSS2.

Correct versioning would be helpful for installing UPP, both on WCOSS2 and on NOAA R&D machines. It would also cause UPP to function like other software projects in the stack, reducing confusion and risk.

When we manage our software like everyone else, we can take better advantage of tools, get better support from the installation platforms, and reduce maintenance and distribution costs.

gspetro-NOAA commented 3 months ago

Hi @edwardhartnett,

For any standalone releases of UPP this shouldn't be an issue, since we already basically do this with v9/10/11. Production releases are handled by EMC, and our EMC code managers can make a decision there, but I believe they already use the X.Y.Z naming convention.

For public releases of UFS applications, the purpose of naming like upp-srw-v2.2.0 is to make it clear which components/tags are part of which application release. The naming scheme you're proposing obfuscates that relationship; it doesn't mean we can't make a change, but fundamentally, the way public UFS application releases happen is different from what you describe. For example, the idea that future releases of, e.g., the SRW App, would use a versioned release of the UPP implies that there is a versioned release preavailable for them to use (and that releases keep pace with development). In reality, there are no standalone UPP releases planned, although development is ongoing. When the SRW App (or another app) is released, the subcomponent hashes available and tested in the application at that time are the ones we tag. This is at least partly out of necessity, since there are so many components within the UFS Weather Model, most of which we do not control. We cannot force code managers outside of EPIC to issue releases for all of the components.

Ultimately, changing the naming convention for public releases would require a change for all EPIC-managed UFS repositories for consistency. For such a change to make sense, we would likely have to change our public release processes. It seems like you are suggesting a process whereby we essentially release components as packages (like NetCDF). I'm not sure to what extent our teams are able or willing to make that shift, although if that is your ask, I can send the request to management. I know that the Unified Workflow (UW) Team has taken this route with success, so it's not impossible but would be a significant shift. There are feasibility questions particularly when involving institutional repositories outside of EPIC/ufs-community.

In short, if you want EPIC to change the naming scheme for its public application releases, it would likely be best to discuss with EPIC code managers, particularly @FernandoAndrade-NOAA for UPP, @MichaelLueken for the SRW, @chan-hoo for Land DA, and @jkbk2004 for the WM. I'm not convinced it's worth changing the naming unless we are also going to change our release process, and the kind of change you seem to be suggesting would likely require us to get management involved. That said, please do clarify if I've misunderstood what you're suggesting!

Best, Gillian

edwardhartnett commented 2 months ago

I am suggesting that the UFS and UPP would benefit from decoupling UPP from the other repos, and releasing it as a versioned library, just like netCDF and all the NCEPLIBS.

If you continue as you are going, you will end up with many different branches of UPP, all of which have to be maintained. This does not scale. When there are 10 different releases, and a bug that's in all of them, the cost of fixing that bug is an order of magnitude greater. Also there are then 10 more failure points - the fix has to be done correctly everywhere. With manual testing, the test burden also becomes unsustainable and unreasonable.

Instead, as netCDF does, maintain full backward compatibility and fix bugs in the main release. All users are then instructed to update to the latest release, instead of trying to maintain fixes on more than one branch.

For example, the current operational GFS uses netcdf-c-4.7.4. But if a bug is discovered, we (the netCDF developers) will not release a patch for netcdf-c, we will instead instruct NOAA to update to the latest release, which has all current bugfixes. If the bug still exists (i.e. we didn't already fix it since the 4.7.4 release), we will fix it and, if necessary, do a new netcdf-c release. But we will not maintain historical branches based on old releases. Since we guarantee backward compatibility (and enforce it with full testing in our CI), this is a safe operation.

Of course, with netCDF, no other solution is possible. It is used in thousands of operational systems all over the world, and we could not maintain a branch for each of them. But is demonstrates that important operational software can be managed such that upgrading to a new release is safe.

This is obviously something that needs to be discussed and agreed to by all stakeholders. I suggest that you consider this as part of the conversation we are having regarding unit testing. I am not asking for this, I am suggesting it. It will reduce your workload moving forward, allowing more time for productive programming.

There may have been a time when having UPP as a submodule made sense, in that it made distributions easier, but with the new spack-based systems, this advantage disappears. It would be quite easy to install a UPP library as a dependency, just as we do for netCDF and the NCEPLIBS libraries. Releasing UPP as a submodule does not provide any advantage, but does tie you to an accumulating maintenance burden. UPP is a library, and would best be treated as one.

The path to separating UPP would be to do a versioned release for your next release, and then add whatever public-facing tags you like which point to that release. Then begin adding unit tests, until you are confident that your testing is comprehensive enough to make upgrading to a new version a safe operation for users.