det-lab / jupyterhub-deploy-kubernetes-jetstream

CDMS JupyterHub deployment on XSEDE Jetstream
0 stars 1 forks source link

allocation expiration #63

Open pibion opened 2 years ago

pibion commented 2 years ago

I got three "allocation will expire soon" notices (below). I think what I need to do is re-submit an allocation request, @zonca does that sound correct?

Project: PHY210008 Title: Building a sustainable and accessible future for dark matter analysis System: IU/TACC (Jetstream) Allocation: 400000.0 Used: 29892 Remaining: 370108 End Date: 2022-03-31

Project: PHY210008 Title: Building a sustainable and accessible future for dark matter analysis System: IU/TACC Storage (Jetstream Storage) Allocation: 2000.0 Used: 0 Remaining: 2000 End Date: 2022-03-31

Project: PHY210008 Title: Building a sustainable and accessible future for dark matter analysis System: Open Storage Network (OSN) Allocation: 100.0 Used: 0 Remaining: 100 End Date: 2022-03-31

zonca commented 2 years ago

We didn't use much so you can probably just request an extension

pibion commented 2 years ago

@zonca that seems reasonable. I can request a 6-month extension and the only information I need to provide is a comment. It does look like I'll need to submit a renewal after that; I assume the renewal process follows the same schedule as the science allocations?

Here's what I'm thinking for a comment:

I am requesting an extension for this allocation. Having analysis data available is currently the limiting factor in use of the platform and data is only recently becoming available. An additional six months of this allocation would allow testing of new types of analysis.

pibion commented 2 years ago

Okay, it looks like the Jetstream folks asked for an extension while switching our allocation; our end-date is now June 30. The XSEDE interface says @zonca has access to everything, let me know if you run into problems?

zonca commented 2 years ago

Very good. I tried a couple of hours ago and I couldn't access, so I activated my own access in the portal. I'll try again tomorrow.

pibion commented 2 years ago

@zonca it looks like I should plan on putting in a renewal request in the March 15 - April 15 window (https://portal.xsede.org/allocations/research#xracquarterly), does that look right to you?

zonca commented 2 years ago

XSEDE is going to end at the end of August. So I think you should ask directly to the Jetstream team via help@xsede.org how to proceed.

pibion commented 2 years ago

@zonca I heard back from the help team and they suggest I submit a renewal request like normal. They're anticipating that allocations will transfer over smoothly through the XSEDE -> ACCESS change.

zonca commented 2 years ago

I didn't know that, thanks!

pibion commented 2 years ago

@zonca I'm starting on the renewal application at https://www.overleaf.com/read/vvtbxsgxjtfs (view-only link). I sent an invitation to the document to you, let me know if it doesn't come through?

pibion commented 2 years ago

@zonca I'm working on the renewal request and am wondering if our issues with slow spawning (?) due to Kubernetes have been resolved. Things have been faster but maybe it's just that there's so much RAM?

zonca commented 2 years ago

I think the delays in spawning were mostly due to volume attaching that was slow, it is possible that on jetstream 2 is much faster, but I haven't tested in detail. The most important thing is that the infamous issue 40 doesn't affect Jetstream 2 https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream/issues/40, I am keeping an eye on the deployment and it hasn't happened yet.

pibion commented 2 years ago

@zonca great. I've updated the progress report on the Overleaf renewal project. One thing I needed to report on was underutilization - the XSEDE dashboard reports 10% usage. I know the dashboard isn't accurate for Jetstream but it was quite a pain to get the actual usage (I wasn't able to do it, someone from TACC had to help).

So I'm assuming this is reasonably accurate for now.

Assuming this is ballpark accurate, do you think it would make sense to reduce the allocation? I do think we'll see increased usage, we've been hard at work making the data easier to get on the XSEDE platform. I'm not sure we'll see ten times more usage - it's not impossible but difficult to predict.

zonca commented 2 years ago

Yes I think a smaller allocation is fine. Maybe even 50%. We can probably ask for a supplement if we need to.

pibion commented 2 years ago

@zonca great, I've updated the progress report. Please feel free to edit if you'd like. In particular, is there any new documentation you would recommend I link to?

zonca commented 2 years ago

the redeployment to Jetstream 2 tutorials are relevant:

https://zonca.dev/2022/03/kubernetes-jetstream2-kubespray.html https://zonca.dev/2022/03/jetstream2-jupyterhub.html

possibly dask gateway: https://zonca.dev/2022/04/dask-gateway-jupyterhub.html

maybe dask/object store: https://zonca.dev/2022/04/zarr-jetstream2.html

pibion commented 2 years ago

@zonca great, I've added these to the documentation list (not the object store though since I don't talk about that elsewhere) and reduced the allocation request to 200k SUs.

The entire application is ready for review, I'm planning on submitting before 1:30 PM Eastern (taking my gran shopping then and it takes a while).

pibion commented 2 years ago

@zonca interesting things about the application:

  1. No more "ECSS" option. I've requested SGCI, maybe they intend support to flow through them?
  2. Jetstream2 GPUs have to be requested separately from Jetstream2 CPU. In retrospect that makes sense! A GPU node would be nice.
zonca commented 2 years ago

@pibion yes, ECSS is over at the end of August, good idea to request SGCI, but I don't know exactly how it will work.

VM names and sizes are a bit different for JS2: https://docs.jetstream-cloud.org/general/vmsizes/

zonca commented 2 years ago

I've run my consumption calculator:

--------------- SU usage for the minimum scenario
1 m3.medium master - 0 m3.xl workers
8 SU/hour
192 SU/day
5,760 SU/month
--------------- SU usage for the average scenario
1 m3.medium master - 1 m3.xl workers
40 SU/hour
960 SU/day
28,800 SU/month
--------------- SU usage for the maximum scenario
1 m3.medium master - 3 m3.xl workers
104 SU/hour
2,496 SU/day
74,880 SU/month
zonca commented 2 years ago

@pibion you can also run it yourself, I put it in Colab: https://colab.research.google.com/drive/1wF4nyHTdfY1zpXzmFz_yQzdqYFBloEHQ?usp=sharing

pibion commented 2 years ago

@zonca got it, I think I've updated everything. I'm asking for 250k SUs on Jetstream2 CPU and 100k SUs on Jetstream2 GPU.

pibion commented 2 years ago

@zonca okay we're submitted! I added a request for Jetstream2 GPU and consolidated the ECSS and SGCI sections into one SGCI section.

pibion commented 2 years ago

And we have been approved! Most useful comment is that I did not provide information about success metrics, such as number of people using the platform, frequency of workshops, and typical number of people at a workshop.

zonca commented 2 years ago

very good, it is also useful if we have low usage to explain why. Generally for us is just that we had less concurrent users than expected.