Closed jiefugong closed 7 years ago
@yuvipanda would you mind taking a look at this PR? https://github.com/data-8/infrastructure/issues/13#issuecomment-286578194
to-do:
@yuvipanda what do you think of the backup script so far?
Great and quick work! I haven't had time to look at it yet, but one question is - how will this be invoked? Each node also has a 100G base disk attached to it that we do not want to backup - only the disks attached to the persistent volume claims in the namespaces we care about. So I was thinking we'd look at the PVC objects using the kubernetes API client and then derive the google cloud disk names from there before snapshotting. How does filtering by name work?
Also, have you tested this in -dev?
Thank you for the quick work! I can look at the code for style and design aftterwards, but since snapshotting doesn't have many downsides (unlike autoscaling!), we can also just deploy this and then do CR afterwards.
On Thu, Mar 16, 2017 at 10:17 PM, Peter Veerman notifications@github.com wrote:
@yuvipanda https://github.com/yuvipanda what do you think of the backup script so far?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/data-8/jupyterhub-k8s/pull/142#issuecomment-287269222, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB23p4XOHAhg8n72XhcD9zK8sZzmXiJks5rmhdqgaJpZM4MgDvE .
-- Yuvi Panda T http://yuvi.in/blog
hi @yuvipanda, thank you for the quick comment. to comment on what the script does right now, it is more or less a more organized and glorified version of the script:
gcloud compute disks list | grep gke-prod-49ca6b0d-dyna-pvc | awk '{ print $1; }' | xargs -L1 gcloud compute disks snapshot
however, i realize this is probably not the best way to go about do things. i wasn't sure how quickly this was to be enabled, so i just wanted to get it working first. if you'd like, i'd assume the next step to take would be to use the kubernetes client as you mentioned to look at the disks associated with each notebook PVC?
Update: sorry I think I misunderstood some parts about the client. Just deleted my comment earlier.
Maybe we could do better with error handling? Now with an error from the google cloud api client, it will throw a googleapiclient.errors.HttpError
if API call does not get a HTTP 2xx response. Maybe you want to log such exceptions before throwing them, in case you want to let it run by cron.
@yuvipanda if you could review this at your leisure. Some notes:
Added Kubernetes client and now the backup script does the following:
backup-disks.py
, where all the persistent disks related to the project are collectedMight be some repeated code between here and the autoscaler script, but I think functionality we can use this first and then extract some shared code (such as what is provided by the Kubernetes client) out to a separate folder or something
I tried running this, and got:
2017-04-03 22:42:14,679 INFO Filtered 229 disks out of 1869 total that are eligible for snapshotting
This seems to imply that only 229 disks are being snapshotted, while more than a thousand should be. Is that right?
Also, can we add commandline params to toggle things individually, and also to set the cluster explicitly? So an invocation can look like backup-disks.py --cluster=prod --backup --delete
?
Otherwise looks great :D
Hey @yuvipanda thank you for the comments :)
I have gone ahead and added the command line arguments as you requested and changed a few more things (I have also added the ability to replace a PV's underlying GCE disk with a new one that is to be later created from a snapshot -- this is done using popen
right now because the Python client has some issues with patching)
As for a response to your question, the only thing left to do in my opinion to finish this PR up is to think about how to specify what disks (of all that belong to the project) are eligible for snapshotting in the first place. As of right now, I have specified it such that only pods of type notebook
are to be snapshotted right now, but I am not sure if this is the best way of doing it. There's also something @SaladRaider about only making disks that begin with the prefix gke-prod-49ca6b0d-dyna-pvc
snapshottable? Please let me know what criteria to filter these disks on and I will be more than happy to update the script :)
@yuvipanda, to update my understanding of your previous comment, it looks like from what @SaladRaider told me, only the underlying disks of certain pod types should be eligible for screenshots (hub, notebooks)
if you agree, what is the best way you would like to label these pods? or is there some specified way you would like for me to filter disks?
Sorry, been swamped!
The general strategy I think should be:
That should be good enough for us, I think
@yuvipanda done. let me know what you think, but i have tested this and it should work and reflect your new comments. i will write some more documentation soon, but this looks good to go to me in terms of functionality
@jiefugong cool! Do you wanna meet up for an hour sometime this week to finish this and get it deployed?
@yuvipanda absolutely! would you like to arrange on slack a time to meet or otherwise let me know what times work best for you this week? :) i am free most mornings or later in the afternoon.
@yuvipanda you free sometime this upcoming week? i am pretty flexible, would love to get this deployed.
Hey! Yes, monday afternoon? say, 2pm? can you send a calendar invite to yuvipanda@berkeley.edu?
On Sat, Apr 15, 2017 at 12:53 PM, jiefugong notifications@github.com wrote:
@yuvipanda https://github.com/yuvipanda you free sometime this upcoming week? i am pretty flexible, would love to get this deployed.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/data-8/jupyterhub-k8s/pull/142#issuecomment-294314429, or mute the thread https://github.com/notifications/unsubscribe-auth/AAB23l0cDG_nbDbLCg64V3O_S4uzx6x_ks5rwSAsgaJpZM4MgDvE .
-- Yuvi Panda T http://yuvi.in/blog
Hi @yuvipanda, sorry for the super late response. Would tomorrow at 3 PM work instead? I've got class until then but would still love to meet up. If not please let me know what other times you're free this week and I'll do my best to accommodate :)
@yuvipanda I think we're ready for merge now -- added all the advice you gave this afternoon. Will add Slack support soon, lmk if i missed anything!
\o/
Automatically creates snapshots of GCE disks and clears old snapshots