cncf / glossary

The CNCF Cloud Native Glossary Project aims to define cloud native concepts in clear and simple language, making them accessible to anyone — whether they have a technical background or not (https://glossary.cncf.io).
https://glossary.cncf.io/
Apache License 2.0
656 stars 550 forks source link

Canary Deployment - improvement in "CNCF glossary" definition #736

Closed antaloala closed 1 year ago

antaloala commented 2 years ago

I think current definition of "Canary Deployment" in CNCF glossary is not totally right due to the current starting sentence:

Canary deployments is a deployment strategy that starts with two environments: one with live traffic and the other containing the updated code without live traffic

That sentence could be understood as starting with two complete environments for the two versions (the current/stable one and the canary one) but it is not like that (this would be too costly from a footprint point of view). In a Canary process you have a whole/complete environment for the current/stable version and a minimum starting one for the canary version and, as more real traffic is moved to the canary version, more workload instances of the canary version are needed and less workload instances of the current/stable version are run ... and so, from a footprint point of view, it is always required the same amount of cloud resources. (this is actually as CNCF Argo-rollouts and CNCF flagger projects implement a canary upgrade)

What then about replacing that first sentence as follows:

Canary deployments is a deployment strategy that starts with two environments: one with live traffic and the other containing the updated code without receiving (yet) traffic and for which a minimum amount of starting cloud resources is required

At the end of the first paragraph it could also be added this sentence:

During a canary upgrade process the amount of required cloud resources (compute, memory and storage) from the stable plus canary environments is kept (almost) the same as it was before starting it

starlightknown commented 2 years ago

I would like to work on this issue

seokho-son commented 2 years ago

Hi @antaloala Thanks for your interest in the CNCF Glossary Project! Your proposal is being considered by admins and glossary approvers :) Please wait for triage.

Hello @starlightknown :) I assume the author of this issue (@antaloala) is willing to work on this issue. If not, you can be assigned. Anyway we need to triage this issue. :)

JasonMorgan commented 2 years ago

Looks good!

Please link the PR to the issue.

antaloala commented 2 years ago

Thanks @JasonMorgan, @starlightknown and @seokho-son for putting hands on this (and sorry a lot for being silent from the time I posted the issue).

CathPag commented 2 years ago

@antaloala, just following up on this.

iamNoah1 commented 1 year ago

TBH, I am not sure if this little tweak is really necessary to mention in order to understand what a Canary deployment is. What do you think @CathPag, @seokho-son @jihoon-seo @IdealUsrname?

CathPag commented 1 year ago

Agreed. I think the "what it is" section is pretty good, actually. I'd leave it as is too.

antaloala commented 1 year ago

IMO (influenced by flagger and argo-rollouts implementations) there are three relevant things you must 'have' for an in-service software upgrade procedure to be considered a Canary one:

  1. Handle taffic during the upgrade procedure to validate the canary/new sw instances (for which a well-known amount of that traffic is being routed to new/canary instances and the rest to old/stable workloads instances, changing the percentages as the canary upgrade progress) is production traffic

  2. Total amount of (cloud) resources (required for new/canary workload instances + old/stable ones) is kept (more or less) constant across the whole canary upgrade execution.

  3. There are several 'validation steps' where decided if canary upgrade process should continue (finally promoting the canary version to become the stable one) or the whole canary upgrade is to aborted (as the new/canary version does not fulfill the expectations).

In current definition the second point is missing.

iamNoah1 commented 1 year ago

IMO (influenced by flagger and argo-rollouts implementations) there are three relevant things you must 'have' for an in-service software upgrade procedure to be considered a Canary one:

  1. Handle taffic during the upgrade procedure to validate the canary/new sw instances (for which a well-known amount of that traffic is being routed to new/canary instances and the rest to old/stable workloads instances, changing the percentages as the canary upgrade progress) is production traffic
  2. Total amount of (cloud) resources (required for new/canary workload instances + old/stable ones) is kept (more or less) constant across the whole canary upgrade execution.
  3. There are several 'validation steps' where decided if canary upgrade process should continue (finally promoting the canary version to become the stable one) or the whole canary upgrade is to aborted (as the new/canary version does not fulfill the expectations).

In current definition the second point is missing.

I feel like I said something like this before. While your comments might be true, the CNCF Glossary focuses on making it easy especially for non-technical folks to understand complex cloud native topics. IMO the amount of technical detail you suggest is not necessarily required to understand what a canary deployment is.

CathPag commented 1 year ago

@antaloala, please refer to the "minimal viable definition" section in our style guide. As Noah stated, your points are valid but do introduce unnecessary complexity for our purpose. So, while correct, I'll close this PR to keep our simplicity focus.

I hope it makes more sense now that you got the chance to review the style guide. We truly appreciate your contribution and hope you can contribute to another term now that we have clarified the Glossary's goals.

antaloala commented 1 year ago

Thanks for your time on this. Yes, according to the style guide to follow (sorry, I did not know about it), I agree that adding all these details would not work ... but, at the same time, this missing information precludes to use/refer to the CNCF glossary definitions to clarify if this or that upgrade procedure can be entitled or not as a Canary upgrade.

Would it be possible to add, in a coming future, some kind of 'extended definition' link to each term so, only when clicked, extend the definition text (adding these additional but relevant details, only for those terms that could require it)?