Which version is stable?

petersilva commented 5 years ago

as of 2024/11/20:

Current python stable version: 3.0.56
Current C stable version: 3.24.11

v2 is considered legacy. Anyone updating a v2 configuration is encouraged to migrate to sr3.

benlapETS commented 5 years ago

We would have to define what is considered stable for Sarracenia. Actually when the flow test is passing in one or many environment we consider (as an act of faith) that Sarracenia is stable.

But is it really the case ? Does it garantee that no user willl experience any major problem from their use case scenario ? What are those use case ? How many we support ? Is it posssible to list them all so we can control what is supported and what it is not ? If we can't, can we portrait our typical users from our major clients so we get such a list, then we may define what are the metrics and rules that define what would be a stable Sarracenia ?

Basically, the more we will have good and bad feedback from users that we analyse, consolidate into those functionnality, the more we will be able to measure which version is stable.

Actually, I'm sorry I only have unanswered question on that subject, and it will take for me more interactions with users to understand their real needs on that and to judge about what is stable.

Also what quality attribute Sarracenia is trying to fulfill in term of data exchange. Is it Reliability, Performance, Availability, Modularity, Testability... Also, how do we measure the goal we want to achieve with those and what are the priorities? After we will have a clear understanding on that, we will be set on whether our architecture is answering to all those needs and what needs to be fix at this higher level. Then we will know without a doubt that we have (or not) a stable version.

At least, this is how I learned to evaluate a project in software engineering, but this is not a one man job. Defining all this would involve the developper team, the management, the clients, the contributors, ... So, where do we start ?

petersilva commented 5 years ago

background and current testing strategy:

developers make changes in issue branches, and never commit directly to master.
the developers run the flow_test, which includes all available unit test, as well as end to end integration testing of a reproducible network of all existing components in a variety of configurations. Perfect result on the flow test (all tests passed) is a condition for acceptance to master branch. The developer should pass the test on one laptop, and a second person on a different machine reproduces the flow_test results as a pre-condition to a merge to master.
developers are invited and very welcome to add to the unit tests and more configuration to the flow test to enhance coverage.
the analysts who deploy generally will not look at development snapshots.
there are px-dev machines, a cluster with some development configurations, that, I think Noureddine has set up to pull daily snapshots from master.
at some point, Peter decides a release is appropriate, and he goes through the procedure described in the Developers Guide to create it. This usually happens once or twice a month.
The following Wednesday after a new release is made, Noureddine has a procedure that updates hpfx machines (science and collab) to the latest release. Note: this is currently broken because we wanted an explicit manual install to cover the change amqplib--> amqp, so used the chance to make the package name more compliant. (python3-metpx-sarracenia -> metpx-sarracenia) So we have a one-time need for a manual install.
users on hpfx can try things out before the version goes further.
from that point analysts manually select the version to use on systems that are gradually more critical, more dev systems, more stage systems, and eventually operational systems.
It is typically months after a release is created that analysts doing operational deployments are comfortable. The operational deployments are typically meant for government-wide mission critical usage, and so caution is expected and prudent.
Once analysts are comfortable, they start recommending that version, and that is the answer sought to the which version is stable question.

There a major refactoring done at the end of 2017, basically completed by Jan. 2018, and releases after that point have essentially been bug-fixes*. Configurations for versions prior to that point have incompatibilities with >=2.18.01 (releases on or after January 2018.) so that the move to a current version needs care. Once at a recent version, all upgrades should be seamless. The only impact of upgrades should be getting bugfixes, and there is no reason for fear, but the analysts see the schism between before and after the re-factor, and it results in a great deal of caution.

In other words, the analysts don't necessarily believe the releases are just bugfixes, and the only thing that will convince is time, and making good releases. My guess is whatever investment we make in unit tests and the flow test will improve release quality and give analysts confidence in the released versions, but that takes time to prove.

ddsr*.cmc is the most critical cluster, and it is running a version from >= 2.18.10 (2018/October), having been upgraded at least once in 2018. So it is on the post 2018 bandwagon, and it since it has hundreds of configurations operating at very high volume, the version running there is likely the one we have the most confidence in.

For now, the only thing I can suggest is that we have a tag (that moves) and analysts vote on a stable version... we just move the pointer, as analysts opinions change. On the other hand, we have been in beta for a year or more, and at some point, we will probably just declare it stable and the releases should be rarer.

petersilva commented 5 years ago

My perception is that the overriding primary concern of analysts trying to deploy is a fear of regressions and changes that affect their configurations (because testing for them is hard, potentially involving months of stabilizing.) Analysts will comment: but there is all sorts of new stuff in every release ... yes, they are either:

new features to address issues encountered in operations, ( #80, #106, #140, and some others with only internal issues ), and in the last three months, much more thorough testing on windows, and
to make usage more consistent and obvious ( #80 again, #25, #92, #31 ) almost always without changing anything existing in use.
work to address additional use cases ( #54 (for DMS), in February the v03 work to permit wider adoption, and compatibility with MQTT.) the v03 work, for example, should have no effect on the operational flows, which are entirely v02. Other contributions are in the form of plugins to address additional use cases, which only affect those uses cases (as no-one else is using those plugins.)

So the basic idea is that there are no changes that will affect existing configurations, except where such a change was explicitly requested by operational analysts ( #80 is, I think the only case of a change in config behaviour since Jan. 2018. )

So the new stuff is very conservative, and the analysts main concern is regressions. We do have an example of a regression, in v2.19.01b1, where in some cases remove does not work. The regression was introduced by a bugfix gone wrong, so there is still reason for analyst caution. That would appear to be the sole such example. The type of breaking changes the analysts are looking for are documented in doc/UPGRADING.rst, and there is little such information from versions in 2018 precisely because very few versions had any sort of breaking change.

petersilva commented 5 years ago

another regression in v2.19.04 releases... fixed by v2.19.09

240,#254,#249,#252,#202 log handling.
238 exception handling.

petersilva commented 5 years ago

another regression was the timeout in accelerator plugins in v2.19.09b1 fixed by v2.19.09b2.

petersilva commented 5 years ago

another regression is #268 introduced in 2.19.09b1... fixed in git (not released.) got accepted even thought python3.4 was failing on travis.com as we get more confident in travis, we should heed it more.

petersilva commented 4 years ago

The current stable version is 2.20.02b1. It has no known regressions and is now widely deployed in critical and complex configurations.

petersilva commented 4 years ago

I just added a stable tag pointing to v2.20.02b1. We can move it whenever it makes sense.

petersilva commented 4 years ago

I just moved the stable tag to point to v2.20.02b3. b1 actually has a bad bug #313 likely present since at 2.19.04 but testing changes prevent git bisect.

petersilva commented 4 years ago

sigh... because of #318 we are moving the stable pointer back to 2.18.10b2...

petersilva commented 4 years ago

OK, in light of the start of v3, and consideration of regressions seen, have created some new strategy described here:

development with same QA tests still occurs on master branch.
master branch feeds Daily repository as before.
there is a new Pre-Release repository, also based on the master branch, that should be used on systems that are used, but not necessarily as sensitive as other systems... they can tolerate some testing.
the old stable repository is now based on a new branch, called v2_stable. This branch is updated from the master branch using release tags, so that it merely promotes the version tested in pre-release.

more information here: https://github.com/MetPX/sarracenia/blob/master/doc/Dev.rst#repositories

petersilva commented 4 years ago

current stable version is v2.20.08p1 (or post1 ... slight error in release results in pypi using post1, and debian p1)

petersilva commented 1 year ago

discussion:

use MetPX-Pre-Release repository for initial candidates for next stable version.
Change Dev clusters to use MetPX-Pre-Release... (so dev happens on test servers apart from dev...)
Change Stage clusters (aka: ddsr-dev, and natscribe-dev) to use MetPX-Pre-Release repository when a new version is being prepared.
are updates automatic (cront-apt, or chef, or ansible... ) on dev to upgrade Pre-Release...
for stage, keep updates manually (but still using Pre-Release repo.)
so... how do we decide to make a new Stable version (that goes in MetPX Repository)
- wait one week after implementation on stage.
- This is when the pip version should be updated (so pip version unavailable until after stability testing.)

Currently discussion is around sr3... and sr3 has not reached production maturity yet... working towards a stable release, which means a rapid deployment process as issues are discovered. Later there should be a slower rhythm of releases. So .. For now... we should be using pre-release everywhere... until we get to a real stable version.

further work:

pre-releases on pip? is that a thing?
for now, all systems should use pre-release .... no version is stable yet...
once we get past first deployment to ops... then repos should change to MetPX (the stable repo) for ops only?

petersilva commented 11 months ago

"rc1" version suffix seems to do the right thing for pre-releases on pypi, so used for launchpad and github as well.

petersilva commented 3 months ago

we do pre-releases (labelled rcX) on github and launchpad (in metpx-pre-release PPA repo)
we also do stable releases on github and launchpad.
current python packaging insists that patch upgrades (post stable version) use the "postX" suffix, and not just "pX" .
v3.0.54p1 is currently deployed to main ddsr operational pumps and starting to be used.
we have released our "last" release of v2: https://github.com/MetPX/sarracenia/releases/tag/v2.24.08post1

v3 should be considered stable at this point, and v2 should be considered legacy.

petersilva commented 2 weeks ago

v3.0.56 is the current stable version. For sarrac (package name metpx-sr3c): 3.24.11

MetPX / sarracenia

Which version is stable? #139

240,#254,#249,#252,#202 log handling.

238 exception handling.