MetPX / sarracenia

https://MetPX.github.io/sarracenia
GNU General Public License v2.0
45 stars 22 forks source link

Which version is stable? #139

Open petersilva opened 5 years ago

benlapETS commented 5 years ago

We would have to define what is considered stable for Sarracenia. Actually when the flow test is passing in one or many environment we consider (as an act of faith) that Sarracenia is stable.

But is it really the case ? Does it garantee that no user willl experience any major problem from their use case scenario ? What are those use case ? How many we support ? Is it posssible to list them all so we can control what is supported and what it is not ? If we can't, can we portrait our typical users from our major clients so we get such a list, then we may define what are the metrics and rules that define what would be a stable Sarracenia ?

Basically, the more we will have good and bad feedback from users that we analyse, consolidate into those functionnality, the more we will be able to measure which version is stable.

Actually, I'm sorry I only have unanswered question on that subject, and it will take for me more interactions with users to understand their real needs on that and to judge about what is stable.

Also what quality attribute Sarracenia is trying to fulfill in term of data exchange. Is it Reliability, Performance, Availability, Modularity, Testability... Also, how do we measure the goal we want to achieve with those and what are the priorities? After we will have a clear understanding on that, we will be set on whether our architecture is answering to all those needs and what needs to be fix at this higher level. Then we will know without a doubt that we have (or not) a stable version.

At least, this is how I learned to evaluate a project in software engineering, but this is not a one man job. Defining all this would involve the developper team, the management, the clients, the contributors, ... So, where do we start ?

petersilva commented 5 years ago

background and current testing strategy:

There a major refactoring done at the end of 2017, basically completed by Jan. 2018, and releases after that point have essentially been bug-fixes*. Configurations for versions prior to that point have incompatibilities with >=2.18.01 (releases on or after January 2018.) so that the move to a current version needs care. Once at a recent version, all upgrades should be seamless. The only impact of upgrades should be getting bugfixes, and there is no reason for fear, but the analysts see the schism between before and after the re-factor, and it results in a great deal of caution.

In other words, the analysts don't necessarily believe the releases are just bugfixes, and the only thing that will convince is time, and making good releases. My guess is whatever investment we make in unit tests and the flow test will improve release quality and give analysts confidence in the released versions, but that takes time to prove.

ddsr*.cmc is the most critical cluster, and it is running a version from >= 2.18.10 (2018/October), having been upgraded at least once in 2018. So it is on the post 2018 bandwagon, and it since it has hundreds of configurations operating at very high volume, the version running there is likely the one we have the most confidence in.

For now, the only thing I can suggest is that we have a tag (that moves) and analysts vote on a stable version... we just move the pointer, as analysts opinions change. On the other hand, we have been in beta for a year or more, and at some point, we will probably just declare it stable and the releases should be rarer.

petersilva commented 5 years ago

My perception is that the overriding primary concern of analysts trying to deploy is a fear of regressions and changes that affect their configurations (because testing for them is hard, potentially involving months of stabilizing.) Analysts will comment: but there is all sorts of new stuff in every release ... yes, they are either:

So the basic idea is that there are no changes that will affect existing configurations, except where such a change was explicitly requested by operational analysts ( #80 is, I think the only case of a change in config behaviour since Jan. 2018. )

So the new stuff is very conservative, and the analysts main concern is regressions. We do have an example of a regression, in v2.19.01b1, where in some cases remove does not work. The regression was introduced by a bugfix gone wrong, so there is still reason for analyst caution. That would appear to be the sole such example. The type of breaking changes the analysts are looking for are documented in doc/UPGRADING.rst, and there is little such information from versions in 2018 precisely because very few versions had any sort of breaking change.

petersilva commented 5 years ago

another regression in v2.19.04 releases... fixed by v2.19.09

petersilva commented 5 years ago

another regression was the timeout in accelerator plugins in v2.19.09b1 fixed by v2.19.09b2.

petersilva commented 5 years ago

another regression is #268 introduced in 2.19.09b1... fixed in git (not released.) got accepted even thought python3.4 was failing on travis.com as we get more confident in travis, we should heed it more.

petersilva commented 4 years ago

The current stable version is 2.20.02b1. It has no known regressions and is now widely deployed in critical and complex configurations.

petersilva commented 4 years ago

I just added a stable tag pointing to v2.20.02b1. We can move it whenever it makes sense.

petersilva commented 4 years ago

I just moved the stable tag to point to v2.20.02b3. b1 actually has a bad bug #313 likely present since at 2.19.04 but testing changes prevent git bisect.

petersilva commented 4 years ago

sigh... because of #318 we are moving the stable pointer back to 2.18.10b2...

petersilva commented 4 years ago

OK, in light of the start of v3, and consideration of regressions seen, have created some new strategy described here:

more information here: https://github.com/MetPX/sarracenia/blob/master/doc/Dev.rst#repositories

petersilva commented 4 years ago

current stable version is v2.20.08p1 (or post1 ... slight error in release results in pypi using post1, and debian p1)

petersilva commented 10 months ago

discussion:

Currently discussion is around sr3... and sr3 has not reached production maturity yet... working towards a stable release, which means a rapid deployment process as issues are discovered. Later there should be a slower rhythm of releases. So .. For now... we should be using pre-release everywhere... until we get to a real stable version.

further work:

petersilva commented 10 months ago

"rc1" version suffix seems to do the right thing for pre-releases on pypi, so used for launchpad and github as well.

petersilva commented 1 month ago

v3 should be considered stable at this point, and v2 should be considered legacy.