Open zonca opened 9 months ago
@zonca will check status of allocation, do a test deployment and destroy it, then point Amy to the docs so she can execute the deployment.
We will start with a plain Kubernetes deploy, then will add all other features, in particular #84
testing the deployment, until Terraform step it works, however by mistake I released the IP pointed by supercdms.jetstream-cloud.org, opened ticket.
Now testing Ansible default deployment of Kubernetes. See also #92
taking the opportunity of rebuilding the deployment to improve it. I am working on deploying a load balancer on top of JupyterHub, see https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/issues/45
I have a networking issue with the load balancer, opened a ticket about it.
still debugging with the help of Jetstream's support
@zonca should I still give the instructions a try, or wait until the load balancer is debugged?
You can try as it is now
issue fixed by Jetstream support. Now I'll write a tutorial about this, then resume deploying the SuperCDMS JupyterHub
I am finalizing the load balancer tutorial
hit other issues with the load balancer, waiting for help from jetstream support
So I started trying to redeploy - is that okay in the current state? I went to https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/blob/cdms/DEPLOY.md and tried to follow the "REDEPLOY.md" instructions, but I'm not sure which prompt I should be using. I'm guessing it's a prompt on jetstream?
@pibion I'll tag you on other issues for this, see the other notifications
@pibion I restarted working on this, deploying a load-balancer is not reliable, so I am going to deploy it without a load balancer, as it was before.
Recently we added support for clusters with both CPU and GPU nodes at the same time: https://www.zonca.dev/posts/2024-02-09-kubernetes-gpu-jetstream2
So I am deploying a test cluster with 1 CPU and 1 GPU node.
I'll notify here when it is available for testing.
ok, I have a preliminary version of the deployment at:
https://kubejetstream-1.phy210008.projects.jetstream-cloud.org/
I am using a temporary URL for now, we will put it under supercdms.jetstream-cloud.org later.
For now I just deployed a plain JupyterHub with a Tensorflow image with GPU support and Gitlab auth. No data sharing volumes for now.
@pibion is it ok if we talk about next steps on Tuesday before or after the kaitai call?
next step is to try use one of the Singularity images, see https://github.com/det-lab/jupyterhub-deploy-kubernetes-jetstream/issues/84
Instances of the old deployment are still on Jetstream 2 in "Shelved Offloaded" state. However we prefer to deploy from scratch, as we had some issues in the old deployment, and better get the latest improvements.