ControlSystemStudio / phoebus

A framework and set of tools to monitor and operate large scale control systems, such as the ones in the accelerator community.
http://phoebus.org/
Eclipse Public License 1.0
92 stars 90 forks source link

Release Guildeline and Procedure #1380

Open shroffk opened 4 years ago

shroffk commented 4 years ago

Discussion:

It is great that over the last few months we have been able to have multiple releases of phoebus. The release process has been quick and simple...this should help us push out changes and bug fixes quickly to our users and improve the general user experience.

With phoebus moving into production at multiple sites (SNS and ESS have had it in the control room for awhile now) I think this would be a good time to formalize and document the release procedure. The practices at each site and the good practices associated with putting out a release that would keep everyone happy.

Initial thoughts. Release:

  1. pre release:

    • [ ] Notification or PR (the release plugin can be configured to push to a branch which we can merge via a PR with approval... the NSLS2 product, run-scripts, and ansible based deployment process needs to be updated with new releases, so such a notification would be helpful)
  2. release process:

    • [ ] Releases should be tagged. (the release plugin does this already)
    • [ ] Source and Common product binaries hosted on the github project. (instead of working hard on automating this we could simply as a community upload win, mac, and linux platform tars)
    • [ ] Change log and Release Notes
    • [ ] prepare for next release, update version etc. (we will have to co-ordinate and find a solution on how the pom and build.xml are both updated)
    • [ ] tech-talk announcement?
kasemir commented 4 years ago

Not sure it will be easy or even necessary to agree on large, common releases. The need to do an update likely differs for each site.

As an example, for SNS I needed #1367 which allows us to tweak certain scan server settings without requiring a restart. So local users might ask: Do I have this feature on my beamline? The answer is: Yes, for all scan servers with a build date after June 1. The version number 4.6.x is totally meaningless.

Historically, ITER required a release not because of the need for a specific feature, but mostly because the project calendar asked for a codac core release at a certain date.

ESS incremented the overall version number in #1102 "Due to deployment at ESS", #1281 "Need to do a release at ESS", #1347 without any description at all.

I don’t think it’s practical to increment version numbers, requiring everybody to update their deployment setups, just because one site needs an update right now. Nevertheless, each site well has a need for an update at certain times, and each site needs to know: Does this product contain bug fix X or feature Y?

The build date or more accurately the date/git hash of the git checkout and the git log up until that time already defines what you have. We can already use that information, there is no need to call a meeting and ask if we can agree: Will tomorrow be Saturday, June 6? If the date or git hash is too cumbersome to track and you want a “version number” for your site, each site can add tags like “SNS_4.5.6” to the repo. The version that you see in the About dialog is basically a text which can be set during build (for SNS, I set it to the build date).

ralphlange commented 4 years ago

Is your argumentation only about the maintenance/bugfix digit? I would generally agree. Or are you arguing against using numbered versions at all? I would strongly disagree.

ralphlange commented 4 years ago

fwiw: ITER has strict requirements on traceability. That's why for "external" products a released version is preferred over a git hash - under the assumption that a released version has a release note, complete and matching documentation etc etc. A date is not unique and does not provide an acceptable level of traceability.

shroffk commented 4 years ago

I think we could add our use cases and requirements first. It might be better to sort out the details once they are all collected.

At NSLS2

I have 2 different products for the accl and beamline. A ansible based deployment solution which build the latest phoebus product and the alarm services on the target machines. Currently, we checkout the master and build that. In the accelerator control room there are talks about wanting a more rigidly controlled version/tag/release. The beamlines want the latest and greatest...being able to push out a bug fix in a few hours is their preferred mechanism.

kasemir commented 4 years ago

Hi Ralph, great Friday discussion, right?

I'm not specifically arguing for a certain numbering scheme, but trying to point to the issue:

I need the update from some PR 123. When I look at an installed product, I need to be able to tell: Does this include the update from PR 123, or is it old, and I need to replace it?

The current "release" number doesn't tell me that.

ESS incremented the "release" number for their needs, so they can tell by the release number if the product includes what they care about or not. But those release numbers which are useful to ESS don't help me at all. Quite the opposite: I needed to accordingly increment numbers in my deployment scripts.

So I guess everybody cares about specific but different PRs. An update that's important to you doesn't matter to me and vice versa. Should we only increment "release" numbers for you? No. Do we increment them for every PR? Well, then we might as well use the date.

kasemir commented 4 years ago

One solution for your accl vs beamline could be: Beamlines use the date/time as version information. For accl, you add tags "ITER_ACCL_1.2.3".

I have the same basic idea, but on a different time line, so again SNS beam lines use the date/time as version, and I add tags "SNS_ACCL_3.4.5" to identify the more rigidly controlled updates.

I think something like that would be more useful than one site deciding what the release is for everybody, or everybody trying to agree on a common release cycle.

ralphlange commented 4 years ago

Date doesn't work because there is a (slight) chance that we do more than one thing per day.

ralphlange commented 4 years ago

Hash is good for traceability, but not sorted, so you can't tell what is newer.

kasemir commented 4 years ago

Sorry, when I say "date" I mean "date + time", and to avoid time zone confusion we can use the UTC time. It's basically the hash, but easier to compare. "Anything after June 1" is easier for a human to check than "anything after d789a7dcbefb316b1c697394f3b9f050eb87fdc4"

ralphlange commented 4 years ago

Major and minor numbers - per convention - indicate different kinds of compatibility. Losing that, and having no idea which update breaks compatibility and which doesn't, would be a disaster. Also, supporting multiple release series is hard if you can't tell them apart.

ralphlange commented 4 years ago

Date and time is not the hash, btw. Not at all.

ralphlange commented 4 years ago

Again, for a specific release series, e.g. 4.5.x, 'date-time-hash' would work for x, as long as they are increasing and uniquely identify a repository commit. Across release series and compatibility breaks - not really.

kasemir commented 4 years ago

Right, maintaining several branches in parallel requires a lot of effort. I don't see that. At best, we have enough people to maintain the 'master' of the Eclipse-based legacy and the new CS-Studio.

ralphlange commented 4 years ago

How would you mark compatibility breaks?

kasemir commented 4 years ago

Well, Kunal changed the context menu API in #1375. From that PR on, it's different. Previous site-specific code needs to change. If somebody wasn't ready to follow, they'd have to move onto a branch with the old API.

If the organization was larger, you could have a committee to plan the API change, prepare the branch, and keep maintaining that branch for a certain time.

georgweiss commented 4 years ago

I'd like to better understand the apparently negative impact (of releasing from a site) on the deployment procedure reported by @kasemir . Can you please provide details?

kasemir commented 4 years ago

See link in Kunal's introductory post, https://github.com/shroffk/nsls2-phoebus/commit/547f88efc21c07e7ff1c6219370ac04016a9c252.