Closed onsi closed 9 years ago
Looks good.
CC has a first class notion of stacks. How would this play out? Right now you have to tell CC up front the stacks in play which is annoying.
Changes to your "tags" would require a restage? Will we respect a stack:trusty as a stack or not? Do the access controls for PPs follow ASGs?
Will there be a way to see all the tags exposed in Diego?
@fraenkel in order to model stacks as placement pools we would need to be change the placement pool concept to allow for placement pools that the user can set and ones that the operator controls and how to do conflict resolution since a developer can specify a stack in today's world, but the placement pools for an app is going to be derived by the space it's in as-written by @onsi
@jbayer Sure. Which will mean that we will remove any PP tag that begins with stack:, so they cannot do something stupid.
@fraenkel - a reply point-by-point:
Changes to your "tags" would require a restage?
From Diego's perspective you cannot modify tags on a DesiredLRP. You must create a new one. We are free to decide how CC interacts with this API. We don't have a strong distinction in CC today between: "I've made a change that requires a restage", "I've made a change that requires a restart", "I've made a change that doesn't require either". This is a source of great complexity and confusion between CLI & CC and is unfortunate. Strictly speaking, changing tags would only require a restart - in todays world, however, I think we tend to express that as a full-blown restage.
Will we respect a stack:trusty as a stack or not?
Who's the "we" in that sentence? Diego knows nothing about stacks - but you do have to provide a rootfs when you deploy the Cell. I imagine we would tag Cells based on what rootfs they have and using "stack:lucid64" and "stack:trusty64" seems like a natural tagging that jives with CC.
Do the access controls for PPs follow ASGs?
I don't have any strong opinions here. It makes sense to me that they be as parallel as possible (except, perhaps, that we wouldn't distinguish between staging and running).
Will there be a way to see all the tags exposed in Diego?
The Receptor API's /v1/cells
endpoint will include tags in its response. So no /tags
endpoint, but close enough for now...
As for:
CC has a first class notion of stacks. How would this play out? Right now you have to tell CC up front the stacks in play which is annoying.
I said this in the writeup:
Finally: I imagine stack will remain a first-class concept in the CC. Either the CC-Bridge or the CC itself will need to fold the stack into the PP by appending it to the Require field.
Telling CC up front about the stacks is annoying, I agree - but if you give users the ability to specify arbitrary stacks then having validations (by e.g. providing stacks up-front) is probably necessary.
It's a little ugly but I think this picture could work:
constraint
for an application is constructed by the CC as follows (in pseudo code):app.FinalPlacementPool = {
Require: app.space.PlacementPool.Required + "stack:{app.Stack}",
Disallow: app.space.PlacementPool.Disallow
}
@onsi
Changes to your "tags" would require a restage?
The one sore point is again stacks. If PPs push me toward a Cell which provides a stack, switching PPs could push me to a Cell with a completely different stack. I would assume in this case we would restage? This could only happen using the Diego APIs and not CC.
Will we respect a stack:trusty as a stack or not?
From CC, a PP contains a set of "tags". One of those tags could be "stack:trusty". We didn't say you couldn't have a tag that begins with stack:
. Would we disallow stack:
at CC? or also at the Diego API?
Do the access controls for PPs follow ASGs?
I am a bit concerned about the usability of PPs. It seems like its for admins only. I guess I would like to see a set of use cases that we are trying to solve to verify that admins are truly the only ones involved. I can imagine use cases where that isn't true, e.g., special hardware, updated software, etc... that a space/app dev would like to experiment or test against. It would seem that its doable with admins only, just a bigger pain since the Org manager has to create the space then the CF operation needs to apply PPs to it.
I believe you answered what I wrote in your last section above.
Will there be a way to see all the tags exposed in Diego?
/v1/cells won't cut it, not when you have 400+ cells. But I was just curious.
The one sore point is again stacks.
I think stack and PP are separate concepts in CC and in CC's API. Under the hood, CC combines stack + PP to produce the final PP that's sent to Diego.
CC could have rules around disallowing PP entries that have stack:
in them.
With this we would have the flexibility in CC to do things like:
Does that make sense? I can update the doc to clarify this once we get some consensus.
I guess I would like to see a set of use cases that we are trying to solve to verify that admins are truly the only ones involved
I agree that there are some usecases where this would be developer-driven (your example of hardware stacks is a good one).
The primary concern, though, is one of security/sla-like behavior. Things like separating prod from staging environments, etc.
/v1/cells won't cut it, not when you have 400+ cells. But I was just curious.
Yeah, but /v1/tags
will be expensive when backed by etcd. If/when we move to a relational DB we'll be able to more sanely support /v1/tags
Stacks in CC
I think we do need to prevent stupidity and block stack:
prefixes.
Since we are relying on CC's management of stack anyhow, we get this restage/restart business for free.
The only think we have to document is the stack:
bit.
hardware as tags
This one brings forth a whole set of issues. As we have already discussed, the admin vs user. However if you dig deeper, you now have a potential restage. As an end user, do I even know what the tags of the current running app are? Yes, we have it in Diego-land, but from CC we have nothing. I re-read your statement all the way above and was wondering how anyone could tell if they should restart/restage their app due to tag changes.
tags
You are correct building the complete list from /v1/cells is trivial enough. Lets see what people really need.
I've updated the proposal further clarifying how stacks & placement pools relate.
I'd like to e-mail vcap-dev with this soon.
On this:
hardware as tags
I wonder if this should really be a stack
notion instead of a placement pool
notion. i.e. you have the GPU stack? I'm not sure.
Worst case we say that PP ∆ always triggers a restage. Slightly better: we teach the CC about which tags require a restage vs not (could see that getting hairy real quick, though).
I imagine we ship this and then get feedback around how people are actually using it and then we'll know ;)
I agree, but we do need to come to terms on this issue. I just want us to resolve this issue as part of completing this proposal.
I can see a few options:
We make changing PP always require a restage. We could then defer implementing rules around when only a restart would be necessary later.
I'm not a fan of doing it this way and we have a poor track record of actually implementing these sorts of optimization slater.
I prefer this option. Any downsides to it?
On Friday, February 6, 2015, Michael Fraenkel notifications@github.com wrote:
I agree, but we do need to come to terms on this issue. I just want us to resolve this issue as part of completing this proposal.
— Reply to this email directly or view it on GitHub https://github.com/pivotal-cf-experimental/diego-dev-notes/issues/21#issuecomment-73292125 .
I, too, prefer option 2. We haven't had more than one stack until now. We (CF) need to be prescriptive on stack declarations. Grr, I just realized that all a stack can represent is a platform. A given Cell or DEA can provide multiple stacks. It seems like we have a gap here unless I jumped a shark on assuming multiple stacks. But it would also align with Docker support.
ok. option 2 it is -- which is already what I have in the doc. I'll try to emphasize the point when i e-mail vcap-dev
@onsi - If a Task or LRP does not specify constraint
, this doc indicates that it means "run anywhere".
I'm assuming the intention is to eventually paint Windows cells with a windows
placement pool tag? If that's the case, then does "run anywhere" include Windows? And if so, does that mean that everyone has to provide a constraint?
@flavorjones - yes, if you have a mixed-OS collection of Cells you'll almost certainly have to provide a constraint, but Diego's not going to enforce that.
Windows would be painted with the stack:windows
placement pool tag so that it will play nicely with CC's notion of stack.
Hey all,
I've thought about this a bunch more and I think I have a better way of dealing with Stack and supporting multiple rootfses (something we're going to need to make the transition to Diego + cflinuxfs2 smoother).
I've updated the proposal here
I quite like where things ended up: rootfs is much more well defined, and placement pools aren't polluted by stacks. The only last awkward bit is the translation in NSYNC/Stager from "stack" to "rootfs" but I think that's OK.
Thoughts?
I prefer the Rep get the info from Garden, that way we have an easier time of keeping things in sync. Rep already needs Garden up and running before it does anything so might as well extend it to some data as well.
The preloaded stuff makes sense since we actually match on the rootfs name. Today we assume Docker is linux. If Docker is available on Windows what is the RootFSProvider called? DockerWin?
I am leaning toward Option A because I don't see how Option B really plays out. Its easy to Marshal this stuff but difficult to Unmarshal since we have no idea what it is unless we carry the type and then we somehow have to map it to something concrete.
I was wondering why we bother with 2 fields. Couldn't we just get away with a URL? Docker is already handled, and for the other it would just be something like preload:///cflinux2.
Still wouldn't know how to differentiate between platform, docker on windows, linux, ppc, etc... We already have an issue on the preload since its doesn't account for platform. cflinux2 doesn't tell me if its intel or ppc. This is something we need to solve sooner than later since I know we will have CF on ppc.
I think ppc would necessarily have a different rootfs - call it cflinuxfs2-ppc
- that will disambiguate.
re docker: docker-windows
? boo... but it works.
I like the idea of just having the scheme -- will update.
I'm thinking of killing of Disallow
- in practice it doesn't really work to list out Disallow
ed pools - how do I add a new cordoned-off zone and ensure applications don't end up there? Update all the Disallow
rules on all the existing PP
s. Yuck.
Met with @dieucao @zrob @ematpl today. Couple of tweaks.
We're going to go with PlacementConstraint
as the noun in CC.
A space
can be associated with a PlacementConstraint
for staging and a PlacementConstraint
for running.
One can set the default staging PlacementConstraint
and the default running PlacementConstraint
. These apply to spaces that don't have an explicitly PlacementConstraint
specified.
Stories are in. I'm closing it out.
https://github.com/pivotal-cf-experimental/diego-dev-notes/blob/master/accepted_proposals/placement_pools.md
is the MVP proposal for placement pools. I'll post details to vcap-dev after a round of initial internal feedback!