det-lab / jupyterhub-deploy-kubernetes-jetstream

CDMS JupyterHub deployment on XSEDE Jetstream
0 stars 1 forks source link

Install the backup system to the Jetstream 2 deployment and restore user data #64

Closed zonca closed 2 years ago

zonca commented 2 years ago

Working on https://zonca.dev/2021/04/jetstream-backup-kubernetes-volumes-object-store.html, if eveything works as expected, I should then be able to automatically restore user data from OSN without the need to do it manually.

Important: If any user wants their old data restored, please first login to https://supercdms.zonca.dev/ so your new volume is created, then mention your username replying below.

For now I only have @zkromerUCD

TODO

zonca commented 2 years ago

unfortunately I noticed that backups of course stopped working on February 26, I was checking on them every month, but I should setup an automated way of monitoring them, sorry about that.

@zkromerUCD I have restored your data with the latest backup I had.

image

zonca commented 2 years ago

Configured daily backups for the 3 current users, pibion, zonca, zkromer, will monitor for a few days.

zonca commented 2 years ago

ok, thanks @pibion, I fixed all the permission issues I had, it was just a misconfiguration on my end.

Now everything seems to be working fine, I am backing up:

pibion
thathayhaykid
tyjmartin98
zkromer
zonca
rahmanole

Will monitor for a few days, then think about a way of automatically be notified if anything fails.

zonca commented 2 years ago

ok, backups have been working fine:

restic snapshots --latest 12
repository 18a1c421 opened successfully, password is correct
ID        Time                 Host        Tags           Paths
---------------------------------------------------------------------
d89a4131  2022-04-26 01:00:40  host-0      pibion         /stash-data
884bd04e  2022-04-26 01:10:25  host-0      thathayhaykid  /stash-data
a7c10de6  2022-04-26 01:20:23  host-0      tyjmartin98    /stash-data
ad83daa9  2022-04-26 01:30:23  host-0      zkromer        /stash-data
41996173  2022-04-26 01:40:11  host-0      zonca          /stash-data
3d315108  2022-04-26 01:50:10  host-0      rahmanole      /stash-data
3a60b675  2022-04-27 01:00:33  host-0      pibion         /stash-data
47591543  2022-04-27 01:10:15  host-0      thathayhaykid  /stash-data
39b239f0  2022-04-27 01:20:24  host-0      tyjmartin98    /stash-data
efc30863  2022-04-27 01:30:22  host-0      zkromer        /stash-data
b1549dd9  2022-04-27 01:40:26  host-0      zonca          /stash-data
336bb791  2022-04-27 01:50:13  host-0      rahmanole      /stash-data
---------------------------------------------------------------------
12 snapshots
zonca commented 2 years ago

I have now setup a monitoring system, I'll monitor the monitoring system for a couple of days to see if it really works. Then I plan to stop the backups for a day to verify I am alerted about it.

zonca commented 2 years ago

Tutorial on how I set it up: https://zonca.dev/2022/04/monitor-restic-backups-kubernetes.html

zonca commented 2 years ago

ok, I see the pings coming in every 2 hours from the cronjob running in Kubernetes:

image

zonca commented 2 years ago

ok, it seems to be working fine, I disabled backups now, I'll verify I get a notification and check delays are appropriate.

zonca commented 2 years ago

Okay, I have got a notification that the system of back up was down. I activated it again.