Closed tonyyanga closed 7 years ago
Awesome! I'll take a more detailed look tomorrow :)
Early feedback:
This is otherwise awesome, thank you for getting a move on this!
If it works fine we can also just start using it soon, and make changes to it slowly. I do want us to do it the 'right way' tho, since both us and others will be using this for a long time :)
Thank you Yuvi for you feedback!
I am not sure if the python client is a good idea, because it does not seem to be designed for async/coroutine requests. It does provide callback style async IO, but I am not sure if it could easily work with the coroutine style of tornado.
I definitely agree that google cloud API calls are much better than gcloud command line. We could migrate to that in the future. In addition to being lazy, we are also uncertain about how to deal with authentication, because it has not been decided how we are going to deploy the script.
It is my personal decision not to use OOP style class with gcloud_update.py
. From my perspective classes should introduced when there are data to be associated with functions, which does not seem to be the case in the script so far. We might see the necessity after we start to use google cloud APIs, and it won't be difficult to switch.
Reasons for not run this version of the script inside a container:
4a. We need to replace kubectl proxy with recommended authentication settings when "Accessing the API from a Pod", involving service accounts. I prefer to do that after it is certain that we need such complexity. See https://kubernetes.io/docs/user-guide/accessing-the-cluster/#accessing-the-api-from-a-pod for details.
4b. Uncertain how to deal with google cloud authentication credentials
4c. Uncertain if I should run populate.bash
just inside a container
Make sense. Honestly I might prefer command line arguments.
I will fix that soon, if you are sure that diagnostics data will not be lost when statsd
is killed.
Additional point regarding populate.bash
: @yuvipanda it will be great if you could incorporate a python version of that in the autoscaler gcloud_update.py
.
Thanks.
Thanks for the response!
requests
, the current library being used, is also not async. I'm also not sure why this code needs to be async? For populate.bash
, it is something we should replace with an in-cluster replacement soon anyway - a DaemonSet
that is pulling all images required on the cluster. I'll work on that this week so we won't have to do populate.bash anymore - you can ignore that from your autoscaler.
I just woke up, so I'll actually read through the code in a few hours :)
Thank you Yuvi!
We have not yet figured out whether the scale function should be called in an event-driven way or run by cron (or similar) after every certain time period, as I specified in the original pull request. If it needs to be called in an event-driven way, I might introduce tornado or other async web functions. Async won't be needed if eventually it is called by cron.
I agree with your point. I think that it is better to write 'interfaces' after we understand enough about other providers including Azure, so that we don't have to modify our interfaces frequently. We could do that later in this semester.
I do personally think that such a service should run as a prod like cull (and be event driven, honestly). We can try to figure out the authentication parts and make it work as a pod in the near future.
Thank you we will take the populate.bash out. :)
On Mon, Feb 20, 2017 at 1:17 PM Yuvi Panda notifications@github.com wrote:
Thanks for the response!
- I'm confused about where async code is being used right now - I don't see any yields. requests, the current library being used, is also not async. I'm also not sure why this code needs to be async?
- Sensible, and I agree. But, see point 3.
- While I agree that encapsulation is a good line to draw for 'class or functions', I think it is the wrong one in this case. Specifically, we'll want this to work on Azure too before end of semester - and defining an interface that can be implemented by various 'providers' is a decent use case for a class in python. Doing this without classes involves a lot of 'ifs' scattered around the code, usually.
- I really really think we should be using the kubernetes API for this - it does all of this automatically for us already. I do want this to run on the k8s cluster - one of the advantages of this year's deployment over last year's is that everything is managed and run in one way (kubernetes), and I'd like to very much keep it this way. If we have to run this elsewhere, then we either end up with infrastructure that isn't 100% reproducible, or we've to use ansible / puppet / some-other-thing in addition to kubernetes to provision and run this script reproducibly in some other way.
For populate.bash, it is something we should replace with an in-cluster replacement soon anyway - a DaemonSet that is pulling all images required on the cluster. I'll work on that this week so we won't have to do populate.bash anymore - you can ignore that from your autoscaler.
I just woke up, so I'll actually read through the code in a few hours :)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/data-8/jupyterhub-k8s/pull/117#issuecomment-281183212, or mute the thread https://github.com/notifications/unsubscribe-auth/AFCJNr8TS99cEkV6vEUJoGavGFKaYwvPks5regLagaJpZM4MFy_g .
-- Best regards,
Sincerely, Tony Yang
I just realized - when scaling 'up', I think we should first look for nodes marked unschedulable and re-mark them as schedulable before adding more nodes. You could otherwise end up in a situation where you are using a lot more nodes than necessary simply because some are marked unschedulable.
So scale up logic should check if there are any nodes marked 'unschedulable' before creating new nodes, and if they are it should just mark them as schedulable instead...
@yuvipanda I believe that the current version will first try to mark nodes schedulable to satisfy the demand. It will only scale up nodes if it cannot satisfy the demand in such ways.
@yuvipanda I feel that I have added enough debug logs to track down specific action taken by the autoscaler and addressed other immediate concerns.
Awesome. I'll take another look later today :)
On Feb 20, 2017 4:21 PM, "Tony Yang" notifications@github.com wrote:
@yuvipanda https://github.com/yuvipanda I feel that I have added enough debug logs to track down specific action taken by the autoscaler and addressed other immediate concerns.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/data-8/jupyterhub-k8s/pull/117#issuecomment-281211509, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB23sd_p7vxtbj-nHeXXbCGCFnvcU_Pks5rei37gaJpZM4MFy_g .
@yuvipanda
I did not remove the one-line functions because Kubernetes Python client describes a node in a different way. The python client used a v1Node
class, while parsing the JSON response from the REST API calls will create a dict
. These one-line functions provide abstraction to avoid changing the logic code when switching between node.metadata.name
and node["metadata"]["name"]
.
@tonyyanga but you're no longer using node["metadata"]["name"] anymore, right? so why keep them?
@yuvipanda
In the future we might decide to switch to native REST API calls. The abstraction provided by those one-line functions preserved the code out of utils.py
when I switch between Kubernetes Python client's node.metadata.name
and native JSON's node["metadata"]["name"]
.
Two possible cases exist:
A quick example for 2:
V1Node patch_node(name, body, pretty=pretty)
is called when marking a node schedulable or unschedulable. It is documented in the link below:
https://github.com/kubernetes-incubator/client-python/blob/master/kubernetes/docs/CoreV1Api.md#patch_node
The documentation is incorrect. First, in the example code kubernetes.client.UnversionedPatch()
simply causes error. Second, a valid body
turns out to be a V1Node object, which can not be told from the documentation. Third, and probably the most important one, it provides no choice between application/json-patch+json
, application/merge-patch+json
, application/strategic-merge-patch+json
.
Please see https://github.com/kubernetes/community/blob/master/contributors/devel/api-conventions.md#patch-operations for details about different PATCH strategies.
While I agree with the documentation issue, I don't think switching back to writing our own client implementation is something we would/should ever do. Consider the amount of authentication code you'd need to write :) And if for some reason we want this to be a web application, we can always use flask+uwsgi or something along those lines. In general, code shouldn't be async unless it needs to be, and when it does you need to change a ton of fundamental things anyway. Same with switching back to calling the API directly -
For the documentation issue, we should be good open source citizens and send them issues and patches :D
Keeping extra code around for possible future cases is a violation of the very useful YAGNI principle (https://martinfowler.com/bliki/Yagni.html). It's the same reason I'm ok with us not using classes or abstracting for multiple providers right now (since we do not need it yet). I do think we should have our code be as simple and small as possible, with just as much (not more and not less!) amount of indirection.
Haven't looked at the code again (sorry!). Should we do a test run of this on prod this week? I'll be happy if we manage to downsize it just once, even.
Sorry I have been busy too. I am free between 1PM and 4PM tomorrow and can show up at BIDS to do the test together (in case anything bad happens :) ). Let me know when and where will be good for you.
I've to be at a clinic all day tomorrow - how about thursday?
On Tue, Feb 21, 2017 at 10:35 PM, Tony Yang notifications@github.com wrote:
Sorry I have been busy too. I am free between 1PM and 4PM tomorrow and can show up at BIDS to do the test together (in case anything bad happens :) ). Let me know when and where will be good for you.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/data-8/jupyterhub-k8s/pull/117#issuecomment-281582645, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB23kJCMkrSrPoLPcD6phF4VnCpEcqPks5re9cogaJpZM4MFy_g .
-- Yuvi Panda T http://yuvi.in/blog
How about Thursday afternoon at 1:30PM? Maybe we could chat on Slack.
I think we should also not use 'current-context' - which is what it does right now. 'dev', 'prod' or 'playground' must be explicitly set as a commandline parameter.
Doing things on the wrong cluster has already deleted everyone's homework once (I did it accidentally, thanksfully this was before classes started). So we should not rely on 'current context' at all anywhere.
Scaling up and down now successfully use the Google Cloud Python Client. Maybe you could look into some of the code for this @yuvipanda? :)
@yuvipanda
Addressed the most urgent concerns we discussed on Wednesday.
Now it uses google cloud python client and includes several safety measures for admin mistakes.
I think the pull request can be merged now, with future improvements tracked using issues.
The hub and proxy pods now have a label:
hub:
labels:
hub.jupyter.org/autoscaler-critical: "true"
proxy:
labels:
hub.jupyter.org/autoscaler-critical: "true"
This is set in supercommon.yaml in the config repo
Heya! I am back and working now again, so let's get this going sometime this week?
Sure. If you need me to show up in BIDS, how about Tuesday after 4PM?
@tonyyanga sure! let's do that!
@yuvipanda Cool. I will get to BIDS around 4:10PM. See you then!
I sent a calendar invite, hopefully to the correct email :)
On Mon, Apr 3, 2017 at 4:00 PM, Tony Yang notifications@github.com wrote:
Sure. If you need me to show up in BIDS, how about Tuesday after 4PM?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/data-8/jupyterhub-k8s/pull/117#issuecomment-291302743, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB23q5j4LGpnJO1sfTm46Wm0VS98a-Aks5rsXoFgaJpZM4MFy_g .
-- Yuvi Panda T http://yuvi.in/blog
Hey, Yuvi,
I had a few more bug fixes and updated the README to reflect our updated scaling logic. It so far works nicely on the prod cluster, without actually executing the commands.
I will try to actually run scale actions tonight or tomorrow morning to make sure it work as it claims, or you can run it if you know better about the lab schedule.
I will check tonight and see other steps we need to take, e.g. changing labels.
Thank you!
@yuvipanda
Here are two things that I think that could be done:
Not all students pods are labeled with hub.jupyter.org/autoscaler-critical: "true"
. This is at least the case with stat28 namespace on prod cluster.
Pods of our jupyter deployment (in other words, non-student pod) that can be killed need to be labeled in a different way, so that the autoscaler knows that the nodes hosting them can be condoned or shut down.
@tonyyanga for labeling, I can tomorrow:
That sounds like a better labeling scheme to me than just bools. Does that sound ok or would it be too complex to implement in the scaler?
Another alternative might be:
I'll also make sure we do a deploy to stat28 so it has the labels.
@yuvipanda
For the current autoscaler, only key rather than value of the label is supported for labeling pods. Specifically, we should label pods like cull that can be arbitrarily killed with a different label key, such as hub.jupyter.org/autoscaler-omit: "true"
.
If you find it more straightforward to use the value of certain specified key, I can also make changes accordingly.
@yuvipanda
Also, I realized that the puller-pod will not be killed by cull. It will stay there with PodComppleted status. Either we could omit them or we need to release them somehow.
I am working on a bug with shutting down empty nodes.
@tonyyanga don't worry about puller-pod, I'm working on making those properly cleaned up by themselves.
We ran this together and everything seems ok \o/
Current plan is for @tonyyanga to run this manually 2 times a day for the next week, and then we can look at running this automatically.
Awesome work, @tonyyanga and @jiefugong :D
Closes #74
Summary
Provide auto-scaling without killing user pods, based on Kubernete's
Unschedulable
tag for nodes. See scale/README.md for expected scaling behaviors.How to test
See README.md for directions. Please make sure
kubectl proxy
is ready before runningscale.py
Comments requested
How do you want the scaling script be run? Some options:
How to set the parameters? Including
min utilization
,max utilization
, etcIs it safe to kill a node running cull on it? General proposals for other
critical service
to be protected?What tests do you want to run with it? We have run some tests but far from comprehensive.