Closed dbroockman closed 10 months ago
Note: in the past we've had issues with needing to bump up compute because all students enter the notebook at the same time, e.g., see #4009 . Please allocate adequate resources :). Thank you!
Thanks for the context @dbroockman! I created a Google calendar event that allocates 2 extra spare nodes just before the start of the classes on Monday and Wednesday. I will keep the issue open in case you want to report issues with scale-up.
Thanks. Looking at #4009 from last year it looks like 8 nodes were initially allocated and that turned out not to be enough. So I am worried about whether 2 will be enough. Thoughts?
@dbroockman The two nodes that @balajialg is referring to are hot spares. This means that if n
represents the number of nodes that your class fits on at any given time, there will be n+2
nodes online.
The cluster normally scales up nodes to match demand, so that if all nodes are occupied and one more user logs in, it will start up a new node. However it takes a few minutes for each node to spin up, so that user will see a delay. When a lot of people try to start their servers at the same time, surpassing the rate that nodes can start up, it can cause user facing problems. We can configure the cluster to have spare nodes in reserve, so it can instantly make them available when new nodes are needed. The only downside to having these hot spares online all of the time is that they're doing nothing, which "wastes" resources when the rate of user server startups is low. So the middle ground is to schedule the creation of spare nodes when know there will be a flood of users.
Thanks. Yes, the way my class works, all 350 students will be entering a notebook at the exact same time at 5pm on Mondays and Wednesdays. I'd really like to make sure we don't have students stuck on loading screens for 4-5 minutes, because it's a timed assignment where they only get 20 minutes to complete it. It will also be their first impression of JupyterHub.
Thanks @ryanlovett!
@dbroockman Based on last year's estimate, almost 100 pods (user servers) were packed into a single R Hub node. I am guessing the default node allocation for the R hub would be 2 nodes and then there are 2 hot spares allocated through the calendar event before the start of the class. Unless I am missing something, this should account for 350 students trying to log in between 4:30 and 5:15.
@ryanlovett Thoughts? Should we be more generous with hot spares?
@balajialg It appears that last semester, the default number of placeholders for r hub was 1. If we set it to 2 using the calendar, and if each node can accommodate 100 pods, then that means that there would be immediate capacity for 200 new servers, plus the unused capacity on the currently active nodes.
If 350 students are truly logging in at the same moment, every class, then we could set the number of spares to be 3 for a ~10 minute period around that moment. This would account for 400 users of R hub, though not all are polisci students. If fewer students are actually logging in, or if the ramp up is over a period of 10-15 minutes at the start of class and not at the same moment, then 2 seems reasonable.
Let me organize this a little...
Most anxious scenario:
More realistic scenario:
So this is me thinking out loud. A spare count of 3 is very conservative although perhaps okay for the first week. If the data shows that it is overkill, it can be reduced.
How much do each of these nodes cost to spin up / run for a couple hours? I'd suggest overprovisioning to be safe, yes.
@dbroockman Based on the estimate for n2-highmem-8 in https://cloud.google.com/compute/all-pricing, approximately it should cost around $1 for a couple of hours (not super expensive). I guess the admins can verify in case I missed something in my estimate.
I increased the hot spares to 3 for now which should accommodate ~300 users based on our current understanding.
@dbroockman Based on the estimate for n2-highmem-8 in https://cloud.google.com/compute/all-pricing, approximately it should cost around $1 for a couple of hours (not super expensive). I guess the admins can verify in case I missed something in my estimate.
I increased the hot spares to 3 for now which should accommodate ~300 users based on our current understanding.
yep, that's about right. instance cost is ~$400/month, and with ~730 hours/month, we get ~$0.54/hr for each placeholder node.
we just need to make sure that we spin these down when not needed...
Thanks. Given how cheap this is, I'd ask that you err on the side of more nodes during my class. Thank you!
@dbroockman How did the class go today? Did any students face issues with launching R Hub?
We generously provisioned placeholder nodes before the start of the class today (through the calendar event)
All good today thank you!
Great! Closing this issue for now. Please reopen if we need to troubleshoot an issue.
Course Name
Broockman, POLI SCI 3
Detailed Requirements
Flipped classroom course, ~350 students will be using R Datahub every Monday & Wednesday from 5pm-6:30pm.
Semester Details
Yes
Request Deadline
January 22, 5pm