det-lab / jupyterhub-deploy-kubernetes-jetstream

CDMS JupyterHub deployment on XSEDE Jetstream
0 stars 1 forks source link

Restore SuperCDMS JupyterHub from scratch #90

Open zonca opened 9 months ago

zonca commented 9 months ago

Instances of the old deployment are still on Jetstream 2 in "Shelved Offloaded" state. However we prefer to deploy from scratch, as we had some issues in the old deployment, and better get the latest improvements.

zonca commented 9 months ago

@zonca will check status of allocation, do a test deployment and destroy it, then point Amy to the docs so she can execute the deployment.

We will start with a plain Kubernetes deploy, then will add all other features, in particular #84

zonca commented 9 months ago

testing the deployment, until Terraform step it works, however by mistake I released the IP pointed by supercdms.jetstream-cloud.org, opened ticket.

Now testing Ansible default deployment of Kubernetes. See also #92

zonca commented 8 months ago

taking the opportunity of rebuilding the deployment to improve it. I am working on deploying a load balancer on top of JupyterHub, see https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/issues/45

zonca commented 8 months ago

I have a networking issue with the load balancer, opened a ticket about it.

zonca commented 8 months ago

still debugging with the help of Jetstream's support

pibion commented 8 months ago

@zonca should I still give the instructions a try, or wait until the load balancer is debugged?

zonca commented 8 months ago

You can try as it is now

zonca commented 8 months ago

issue fixed by Jetstream support. Now I'll write a tutorial about this, then resume deploying the SuperCDMS JupyterHub

zonca commented 8 months ago

I am finalizing the load balancer tutorial

zonca commented 7 months ago

hit other issues with the load balancer, waiting for help from jetstream support

pibion commented 6 months ago

So I started trying to redeploy - is that okay in the current state? I went to https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/blob/cdms/DEPLOY.md and tried to follow the "REDEPLOY.md" instructions, but I'm not sure which prompt I should be using. I'm guessing it's a prompt on jetstream?

zonca commented 5 months ago

@pibion I'll tag you on other issues for this, see the other notifications

zonca commented 3 months ago

@pibion I restarted working on this, deploying a load-balancer is not reliable, so I am going to deploy it without a load balancer, as it was before.

Recently we added support for clusters with both CPU and GPU nodes at the same time: https://www.zonca.dev/posts/2024-02-09-kubernetes-gpu-jetstream2

So I am deploying a test cluster with 1 CPU and 1 GPU node.

I'll notify here when it is available for testing.

zonca commented 3 months ago

ok, I have a preliminary version of the deployment at:

https://kubejetstream-1.phy210008.projects.jetstream-cloud.org/

I am using a temporary URL for now, we will put it under supercdms.jetstream-cloud.org later.

For now I just deployed a plain JupyterHub with a Tensorflow image with GPU support and Gitlab auth. No data sharing volumes for now.

@pibion is it ok if we talk about next steps on Tuesday before or after the kaitai call?

zonca commented 3 months ago

next step is to try use one of the Singularity images, see https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/issues/84