Closed smothiki closed 8 years ago
:+1:
Ref #81
:+1:
Proposed API contract:
POST /v2/hooks/key
{"app": "$APP_NAME", "public_key": "$PUBLIC_KEY"}
200 OK
404 UNAUTHORIZED
500 INTERNAL SERVER ERROR
cc/ @helgi
Where are you going to get the public_key
from?
https://github.com/deis/builder/blob/master/rootfs/etc/confd/templates/authorized_keys gets it from etcd
and that's what we are solving for here.
I'm thinking you may want
GET /v2/hooks/keys/$APP_NAME
{"users": {"aaron": "keysandthings"}}
That'd get you keys for the given app, unless you have an alternative way to get the keys from?
wouldn't it be nice if controller updates builder whenever there is a new user key ?
Some mechanism where controller and builder talk to each other might be RPC or simple HTTP server inside builder that listens for keys and controller calls that API endpoint whenever some one does deis keys:add
. Also for app deletion
This avoids builder dependency on database and k8s API but coupling builder with workflow
That seems too complicated - the builder should never talk to the DB, ever. I can see your point about creating the tight coupling between the two but running a HTTP server on the builder creates unnecessary state (IMO).
If we go down the route of tightly coupling things up between the two then we need to discuss the implications of that, same way we do have to consider / discuss why we wouldn't have each component depend on the k8s API (more and more rather than less and less).
@helgi I misspoke in the payload that I posted above (the result of doing too many things at once). Yes, the builder w/o etcd wouldn't have a public key. I agree that the endpoint should look very similar to the one you proposed. However, I advocate for putting the user in the path, since a GET
can't have a request body. The request would then look like GET /v2/hooks/keys/$APP_NAME/$USER_NAME
.
I'd like to address some of the general comments in sections below, and I'd like to hear thoughts.
TL;DR my suggestion is to make the builder query the controller for both app existence and key validity.
If the controller updates the builder, the builder has to hold more state. However, if the builder queries the controller (as described in the previous section), then state is isolated to the controller, where it already resides (more precisely, in the DB that the controller queries).
Managing state in a distributed system is challenging, so let's try to keep it as isolated as possible, and minimize the amount of state in the builder, in this refactor.
The builder already makes 3 RPCs to the controller on every build, and we've accepted the fact that the two components are tightly coupled and will be through the beta.
Adding a 4th RPC is not ideal from a performance perspective, but adding the RPC now and reducing them later is a manageable task.
I 100% agree with @helgi that the builder should never access the DB. I'll expand - nothing but the controller should ever access the DB (see my comments above on isolating state).
Additionally, we've built an abstraction layer - the controller API - on top of the DB. If anything circumvents that layer, the abstraction is leaky and we will have introduced a large maintenance problem (imagine if many components are directly accessing the DB and we try to refactor the schema).
I do agree that the event stream could be a good source from which to listen for app deletion events. However, if we do so, then we rely more heavily on builder state. Instead, why not rely on the aforementioned GET /v2/hooks/keys/$APP_NAME/$USER_NAME
RPC to determine whether the app exists as well? If it doesn't exist, then we can delete it's repository folder.
I also disagree that the event stream is a good data source for adding and removing SSH keys (again, for builder state reasons).
I'm not clear if anyone is proposing that key management strategy above, but I'd like to make my position clear.
In my example the GET
used $APP_NAME
in the URL and the json below it was the return body where you got all users for the given app. Yours simply takes the one more step in the granularity and I'm fine with that.
If we use the API for app existence and user existence then you'd get a 404
either way. That's probably enough unless you want to provide more useful error messages.
I'm not seeing how much more state the builder would have to maintain if it gets app adds / removals from k8s API. Not via an event stream but rather label selectors. The state is owned by k8s and the builder is simply using it to see when it should for example clean up older apps.
If you rely on the API on git push
to figure out if an app exists then when will you figure out what doesn't exist anymore? You are only ever querying against the API for a git push
scenario or are you planning on expanding this further?
Thanks for the clarification, and I agree - it'd be nice to determine whether the app or user doesn't exist without having to check error messages.
I see what you're saying RE the Kubernetes API now, but regardless, the builder would still hold state - the state of polling - and we can get race conditions if the builder runs with more than one replica.
For example, say we're running 3 builders pods (replicas: 3
) and pod A polls the k8s API and finds that myapp
is deleted, but pods B and C don't. If a git push
ends up on B or C, then it could succeed even though the app is actually deleted.
On the other hand, if builder makes the controller API call on each git push
, then if the controller returns a 404
, it knows that the app doesn't exist (using your version of the API).
I don't see a need for the builder to need to know when a given app exists at any other time. Am I missing something crucial there?
Are old apps ever left behind if you are not cleaning things up? Are checkouts always blown away so there is no old code / apps around on the builder pods?
In that scheme, old apps would be left behind if someone deletes an app and never tries a git push
. That could be solved by a "reaper" goroutine that runs outside the critical path (and nothing in the critical path would be dependent on its state).
However, old code would be deleted when the RPC happens on a git push
.
What I'm confused about is how the reaper determines what to delete and what to keep, without checking with something somewhere. It isn't end of the world having an active app reaped I suppose.
So all app checkouts live in the same directory (see here), so the reaper's job would be to walk that directory and query the controller for each app (at a slow interval).
Note that the app checkout directories and the reaper's state are expendable.
So instead of querying the k8s API you want to do the same against the controller API. I'm not really seeing the difference or benefit beyond perhaps not having to deal with the k8s API.
I'm fine with querying the k8s API, I'm just advocating for querying something before running the receive hook on each git push
, possibly in addition to running it in the background.
Maybe we're saying the same thing...
I think we are ... So here is how I see it:
git push
queries Controller API for app
/ user
and gets a 404
if either is missing (you could make 2 calls if you want to know which is missing, I guess?) and gets back a key
if all is okayreaper
queries k8s API to get active apps in one API call.reaper
can work off the Controller API as well, just another endpoint or you use the same endpoint as git push
(sans username) to check each app you know about.
:+1: from me
:100:
Ref https://github.com/deis/workflow/pull/336 for the keys endpoint
here are bunch of use cases I'm not clear about.
@smothiki I've posted answers to some of your questions below. I'm unclear on some others so can you please point out if I've left something unanswered?
Do we have to make an API call to get keys associated with the APP for every git push
Not necessarily. Since we know the app and the user, we could make the call to just get the key for the (app, user) pair.
wouldn't it be redundant most of the time
No, because multiple users may git push
to the same app.
This check at git push will definitely hurt user exp
On what metric?
An App is not associated with the key and user can have many keys .
Right, so the builder would make a call to the controller's /v2/hooks/keys/$APP/$USER
endpoint, which would return back the keys for that user.
let's say user has 100 apps running and uses 200 keys . The call returns 200 keys for an APP. Either we have to match the private key to these 200 keys or write the 200 keys to authorized key file and let ssh server figure out . If we are going with writing option most of the keys have already been written. we have to keep an indexing record of what keys are present and how to write efficiently
My concern is about optimizing the git push operation if we are making a call to controller for each git push.
Also practically the number of time git pushes are made are more than the number of times doing deis keys:add One way of Optimization is to check authorized keys file first and then making an API call for the rest of the keys.
The call returns 200 keys for an APP. Either we have to match the private key to these 200 keys or write the 200 keys to authorized key file and let ssh server figure out
Correct. There is no way around this, regardless of the method of acquiring keys from the controller.
My concern is about optimizing the git push operation if we are making a call to controller for each git push.
We already make 3 API calls to the controller for each git push
operation. While that's not a good reason to add a fourth, it does show that builder <--> workflow RPCs are proven to work. As a sidenote, we should strive to reduce the latency, complexity and number of RPCs in a separate patch (I've created #144 for that).
Also practically the number of time git pushes are made are more than the number of times doing deis keys:add
I agree. Again, though, there's no way around this regardless of key delivery mechanism. The RPC method as proposed returns the full list every time, however, and that's non-optimal. Perhaps a future optimization would be to have the controller only deliver deltas.
One way of Optimization is to check authorized keys file first and then making an API call for the rest of the keys.
I agree. See my above comment on a possible future optimization.
Builder reads keys from etcd. We should avoid doing that and read keys from database