Closed pibion closed 2 years ago
good, I will add one more. It seems like Terraform, which I use to manage the resources, has lost the state. I am surprised about how fragile it is.
Anyway, I am debugging the issue and will notify here when I manage to add a second worker node.
Asked about Magnum, will update here. They also have a new project for 1-click Kubernetes installs named CACAO, but only supports manual scaling.
I investigated the Terraform issue, it looks like it is a known problem that the old Terraform version I am using can "loose the state", :-1: Unfortunately when I tried to use a newer version of Terraform it was not supported by Kubespray. So I would need to redeploy again, then restore all the user volumes and the data volume.
However Jeremy forecasts we could have Magnum in 2 or 3 weeks, so I suggest we wait for Magnum and redeploy then.
In the meantime, if we can't wait, we have 2 options:
@zonca definitely not urgent enough given the risk. I think changing to an 8 GB default config is a great solution.
ok, I deployed your 22.06.3
image
Mike Lowe from the Jetstream team said instead that Magnum could come in August, so let's think about how to proceed.
Hmm yeah that suggests "larger node" to me. Most of the users on Jetstream are novice users and the fewer unnecessary issues the better.
I'm a big proponent of "git push the changes you like, you never know when the chaotic neutral Data Fairy will make your information vanish."
Also, I'd rather break things now while we have relatively few users (and I'm certain nobody is going to lose their thesis work), depending on how usage grows then we may want to think about how to harden the redeployment.
I say we go for it, if it breaks, it'll be a great learning experience. Mwa hahahaha
ok, maybe I can try the resizing on Friday, and hopefully can put it back online by Monday or Tuesday if it breaks.
I think it would also be good to either take away the "full node" option, or re-deploy in a way that someone choosing the "full node" option still leaves two to three default instances available (8 GB is likely fine for default).
Looking at our remaining allocation, I'd advocate for a full node plus three default instances - I'd love to have a full node available as we work on data analysis tools.
And right now it's a weekly use-case where three people are on default instances at the same time.
ok, I'll target for this Friday
@pibion I'll start now, JupyterHub will probably be down the whole day
actually, the resizing worked very quickly without destroying the instance, and that is good. However there are some issues with the CVMFS volume, I'll investigate it and notify when fixed.
I think it was just the CVMFS pod that was restarting, I think it is working, please let me know if you find issues. now the instance has 60 GB of RAM and can sustain 3x8GB + 1x24GB, we have some RAM left, let's leave 6 GB to the OS. @pibion how do you want to split the 54 GB available?
@zonca could we bump the 8GB instances to 15GB, add one 15 GB instance, and then give the rest to the large-cpu-memory instance?
So 4x15GB + 1x40GB?
We only have one virtual machine running with 60 GB. So we can do 15 and 40 So we can either have three default instances running or one large instance and one default instance.
Definitely three default instances, sorry, misunderstood the question.
sorry, I confused myself, I resized to a m3.xl
https://docs.jetstream-cloud.org/general/vmsizes/#jetstream2-cpu
So we have 125 GB of RAM, I don't think they are charging yet on Jetstream 2, so it is fine. We will most probably rebuild from scratch in September/October when we have Magnum.
So how do you want to split 125 GB of RAM?
How about 4x15GB + 2x30GB?
oh well I also added an extra large:
we could get a large 60 GB with 8 + 8 + 8 + 30 https://docs.jetstream-cloud.org/general/vmsizes/
On Wed, Jun 8, 2022 at 5:05 PM pibion @.***> wrote:
Looking at our remaining allocation, I'd advocate for a full node plus three default instances - I'd love to have a full node available as we work on data analysis tools.
And right now it's a weekly use-case where three people are on default instances at the same time.
— Reply to this email directly, view it on GitHub https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/issues/76#issuecomment-1150531450, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5Q4S2UQJVSJJV75D52ITVOEYMTANCNFSM5YAO5POA . You are receiving this because you were mentioned.Message ID: <det-lab/jupyterhub-deploy-kubernetes-jetstream/issues/76/1150531450@ github.com>
@zonca I think the time has come for us to support more simultaneous instances!
:partying_face:
I think supporting four simultaneous default instances should be fine for now. Any word on the Magnum update?