cloudfoundry / diego-notes

Diego Notes
Apache License 2.0
23 stars 7 forks source link

Rolling out Diego #14

Closed onsi closed 9 years ago

onsi commented 9 years ago

I've added a proposal (now accepted):

https://github.com/pivotal-cf-experimental/diego-dev-notes/blob/master/accepted_proposals/rolling-out-diego.md

tl;dr - we move away from environment variables to stack. We simplify the CC by:

jbayer commented 9 years ago

@onsi your comment about .NET stack using diego* may address another potential issue. heroku now also has a cedar-14 stack based on ubuntu 14.04 [1]. so it would be nice to move to an ubuntu 14.04 root file system for apps, and stack is likely how we would do that too.

[1] https://devcenter.heroku.com/articles/cedar

cf-buildpacks-eng commented 9 years ago

That would work for the dotnet/windows team. I would love to not think about the env vars.

PS. Hopefully this does not make the stories to link buildpacks to stacks any more challenging.

On Fri, Jan 16, 2015 at 5:16 PM, James Bayer notifications@github.com wrote:

@onsi https://github.com/onsi your comment about .NET stack using diego* may address another potential issue. heroku now also has a cedar-14 stack based on ubuntu 14.04 [1]. so it would be nice to move to an ubuntu 14.04 root file system for apps, and stack is likely how we would do that too.

[1] https://devcenter.heroku.com/articles/cedar

— Reply to this email directly or view it on GitHub https://github.com/pivotal-cf-experimental/diego-dev-notes/issues/14#issuecomment-70331514 .

tedsuo commented 9 years ago

One question: If I have stack:diego or stack:not-diego in my manifest, and the operator changes the policy for who is on diego by modifying the stack to something different than what I've set, what happens when I push again with my manifest? Does it change the policy back?

onsi commented 9 years ago

that's correct @tedsuo - that's a generic concern, I guess, with any approach we might take where the user can choose to opt-in and the operator can use the same mechanism to try to force the user in.

I'd really like to move away from "operator sets a flag" to "operator opts things in incrementally" as that will keep the code simpler and make it easier to reason about how we migrate load from the DEAs to Diego.

I'm inclined to say that this is a people/communication issue and that operators will need to walk their users through opting in by a certain deadline and then dropping support for the DEAs (once that's done, if your manifest is wrong you'll get a failed push).

One other concern I have is: will "diego" be part of the stack forever and ever after this? Heroku's approach might say yes... but that's kinda sucky.

tedsuo commented 9 years ago

Looking at the proposal again, it seems clear that the goal is to give the operator control over which backend apps are running on. This makes sense, as it is the operator who is orchestrating the diego rollout. Using the stack parameter forces the operator to coordinate with the user - it is actually the user who is in control, as they can opt in and out at any time by changing their stack. It also forces the user to be aware of diego. I think this will cause trouble.

We want the operator to have complete control, and the users (beyond the initial beta users) ideally to be unaware or optionally aware of the switch. This makes the stack variable the wrong solution for the diego rollout. This is in contrast with changing the base image to 14.04, which is not transparent and must be coordinated with the user.

To modify your original post, I think that we can simplify the CC by:

The above things add great complexity and are not actually needed. Dumping them sounds good.

But, given that the backend swapout is ideally a behind-the-curtain change, there are problems implementing it via piggy-backing on a user-facing variable.

If the control was entirely in the operators hands, then a rollout can be properly managed, and could vary from customer to customer. For example, early users could explicitly opt in by emailing the operator, and the operator can email users as they gradually force them over. But since this is all in a separate side-channel; it is not mixed up with the control mechanism. So at another on-prem site, a different communication level or method could be chosen.

Of course, now I'm a jackass if I don't help propose an alternative. But first, is there agreement that we should take the user entirely out of this equation? Is there any reason to leave them in? This rollout approach worked very well with Tracker, btw.

fraenkel commented 9 years ago

Regardless of what mechanism we use, I am trying to understand how we get to the end game. If we slowly switch all the users to the "diego" stack. At some point everyone or a large portion are switched. Now what do we expect is the final setup? Is this modified stack what becomes the default stack? We can't exactly change the stack again.

onsi commented 9 years ago

Based on feedback from IPM/Retro/these comments and some feedback from Tony I've modified the proposal.

Major changes:

onsi commented 9 years ago

Any opinions on this? I'd like to send something out to vcap-dev today. I think what we've got solves most of the problems/concerns I've heard articulated.

fraenkel commented 9 years ago

The only thing that bothers me is the magic when moving apps between DEA and Diego. I don't see it as being easy nor fool-proof.

onsi commented 9 years ago

@fraenkel we have that stuff basically working today. But I agree - ideally folks wouldn't rely on it. Instead they'd do whatever blue-green deploy strategy they do today, except green would be diego=true and blue would be diego=false.

But if they don't do that, we'll have the next best thing: an immediate start on the new stack followed by an eventual stop on the old stack.