Support for more instances?

pibion commented 2 years ago

@zonca I think the time has come for us to support more simultaneous instances!

:partying_face:

I think supporting four simultaneous default instances should be fine for now. Any word on the Magnum update?

zonca commented 2 years ago

good, I will add one more. It seems like Terraform, which I use to manage the resources, has lost the state. I am surprised about how fragile it is.

Anyway, I am debugging the issue and will notify here when I manage to add a second worker node.

zonca commented 2 years ago

Asked about Magnum, will update here. They also have a new project for 1-click Kubernetes installs named CACAO, but only supports manual scaling.

zonca commented 2 years ago

I investigated the Terraform issue, it looks like it is a known problem that the old Terraform version I am using can "loose the state", :-1: Unfortunately when I tried to use a newer version of Terraform it was not supported by Kubespray. So I would need to redeploy again, then restore all the user volumes and the data volume.

However Jeremy forecasts we could have Magnum in 2 or 3 weeks, so I suggest we wait for Magnum and redeploy then.

In the meantime, if we can't wait, we have 2 options:

reduce the memory for the default configuration to 8GB instead of 12, so we have 1 extra user.
I could "Resize" the worker instance and make it larger, but yeah, it is a risky operation that could break the cluster.

pibion commented 2 years ago

@zonca definitely not urgent enough given the risk. I think changing to an 8 GB default config is a great solution.

zonca commented 2 years ago

ok, I deployed your 22.06.3 image

zonca commented 2 years ago

Mike Lowe from the Jetstream team said instead that Magnum could come in August, so let's think about how to proceed.

pibion commented 2 years ago

Hmm yeah that suggests "larger node" to me. Most of the users on Jetstream are novice users and the fewer unnecessary issues the better.

I'm a big proponent of "git push the changes you like, you never know when the chaotic neutral Data Fairy will make your information vanish."

Also, I'd rather break things now while we have relatively few users (and I'm certain nobody is going to lose their thesis work), depending on how usage grows then we may want to think about how to harden the redeployment.

I say we go for it, if it breaks, it'll be a great learning experience. Mwa hahahaha

zonca commented 2 years ago

ok, maybe I can try the resizing on Friday, and hopefully can put it back online by Monday or Tuesday if it breaks.

pibion commented 2 years ago

I think it would also be good to either take away the "full node" option, or re-deploy in a way that someone choosing the "full node" option still leaves two to three default instances available (8 GB is likely fine for default).

pibion commented 2 years ago

Looking at our remaining allocation, I'd advocate for a full node plus three default instances - I'd love to have a full node available as we work on data analysis tools.

And right now it's a weekly use-case where three people are on default instances at the same time.

zonca commented 2 years ago

ok, I'll target for this Friday

zonca commented 2 years ago

@pibion I'll start now, JupyterHub will probably be down the whole day

zonca commented 2 years ago

actually, the resizing worked very quickly without destroying the instance, and that is good. However there are some issues with the CVMFS volume, I'll investigate it and notify when fixed.

zonca commented 2 years ago

I think it was just the CVMFS pod that was restarting, I think it is working, please let me know if you find issues. now the instance has 60 GB of RAM and can sustain 3x8GB + 1x24GB, we have some RAM left, let's leave 6 GB to the OS. @pibion how do you want to split the 54 GB available?

pibion commented 2 years ago

@zonca could we bump the 8GB instances to 15GB, add one 15 GB instance, and then give the rest to the large-cpu-memory instance?

So 4x15GB + 1x40GB?

zonca commented 2 years ago

We only have one virtual machine running with 60 GB. So we can do 15 and 40 So we can either have three default instances running or one large instance and one default instance.

pibion commented 2 years ago

Definitely three default instances, sorry, misunderstood the question.

zonca commented 2 years ago

sorry, I confused myself, I resized to a m3.xl

https://docs.jetstream-cloud.org/general/vmsizes/#jetstream2-cpu

So we have 125 GB of RAM, I don't think they are charging yet on Jetstream 2, so it is fine. We will most probably rebuild from scratch in September/October when we have Magnum.

So how do you want to split 125 GB of RAM?

pibion commented 2 years ago

How about 4x15GB + 2x30GB?

zonca commented 2 years ago

oh well I also added an extra large:

zonca commented 1 year ago

we could get a large 60 GB with 8 + 8 + 8 + 30 https://docs.jetstream-cloud.org/general/vmsizes/

On Wed, Jun 8, 2022 at 5:05 PM pibion @.***> wrote:

Looking at our remaining allocation, I'd advocate for a full node plus three default instances - I'd love to have a full node available as we work on data analysis tools.

And right now it's a weekly use-case where three people are on default instances at the same time.

— Reply to this email directly, view it on GitHub https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/issues/76#issuecomment-1150531450, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5Q4S2UQJVSJJV75D52ITVOEYMTANCNFSM5YAO5POA . You are receiving this because you were mentioned.Message ID: <det-lab/jupyterhub-deploy-kubernetes-jetstream/issues/76/1150531450@ github.com>

det-lab / jupyterhub-deploy-kubernetes-jetstream

Support for more instances? #76