Closed jmunroe closed 6 months ago
@jmunroe I greatly appreciate that level of detail in this issue, nice!!
- Standardize on a ~8 GB RAM, ~1.0 CPU machine type per user with four users / node (using "small" machines).
I consider this to be too conservative on behalf of the user requirements, and that it would be better to start with a smaller requested amount of memory/cpu in order to have lower cloud costs and startup times (note that a request is a k8s term for guaranteed amount when scheduling user containers to run on nodes).
I think concretely, I'd suggest a 0.5GB/1GB request/limit by default and adding a 1/2 GB, 2/4 GB, and a 4/8 request/limit option that could be chosen. Like this we could fit 64 or 32 users per small node by default instead with the 1/2 GB request/limit choice. For comparison, the utoronto hub provides all users with 1/2 GB request/limit, but on average their usage is around 250-400 MB per user and the nodes CPU utilization is commonly around 10%.
If we would go for only a single resource allocation option, and providing a high amount of memory per user, its a breaking change to lower it later, but its not a breaking change to increase it later. In the long run for a large set of users with some groups/classes needing more than others, its also cost inefficient to only provide one option that meets the needs of the most memory needing user. When that happens if there is only one option available, its some complexity to introduce more options later.
EDIT: more concrete proposal on values can be found at https://github.com/2i2c-org/infrastructure/pull/3629#discussion_r1463221224
Thanks @consideRatio . These distinctions about requests/limits are something that will need to include in our community hub champion training. I suspect that @jnywong will becoming to you or others in @2i2c-org/engineering to ask more questions!
I suppose I'm surprised that typical RAM usage is typically only 250-400MB per user. Does that pattern hold for research hubs as well?
I suppose I'm surprised that typical RAM usage is typically only 250-400MB per user. Does that pattern hold for research hubs as well?
Its super hard to say, but until users starts processing data, the memory use is low - but when they do process data it can be anything really, and it can be RAM used temporarily or for longer durations.
I'd like to see communities default to low requests of memory, get help to understand when they run into limits and how it can look, and finally be able to relatively easily increase it when needed.
Today, I had a short conversation with @AIDEA775 about some pieces of information from the Latam community usage that he wants to bring into the conversation (and potentially get more info), so I added this one to the upcoming sprint to foster the discussion and assign it to @AIDEA775 alongside the other already discussing it.
I mostly agree with the @consideRatio proposal:
My proposal is to use this formula and provide options requesting 0.5, 1, 2, and 4 GB of memory, representing ~1/64, ~1/32, ~1/16, and ~1/8 of the node.
- mem request 0.5G
- mem limit 1G (twice the requested memory)
- cpu request 3.6 / 64 (~share of allocatable CPU)
- cpu limit 4 / 64 * 8 (something like eight times the requested CPU excluding the ~400m headroom, but at least 1CPU)
I also consider that there are few users (almost for now) simultaneously using the Catalyst clusters. In the last months, grafana reports only 1-2 concurrent users, with only one peek of 14 users on one day (in the latam cluster).
https://grafana.pilot.2i2c.cloud/d/hub-dashboard/jupyterhub-dashboard?orgId=1&var-PROMETHEUS_DS=b75a13ba-abf3-442f-8b04-00824593c07c&var-hub=All&from=now-6M&to=now&viewPanel=3 https://grafana.pilot.2i2c.cloud/d/hub-dashboard/jupyterhub-dashboard?orgId=1&var-PROMETHEUS_DS=bf57840d-2ffb-45e4-bed2-3679c1ea2cdf&var-hub=All&from=now-6M&to=now&viewPanel=3
There are few usages of nodes in the user-node-pool
:
https://grafana.pilot.2i2c.cloud/d/MMHgC_Qnz/cluster-information?orgId=1&var-PROMETHEUS_DS=b75a13ba-abf3-442f-8b04-00824593c07c&from=now-6M&to=now&viewPanel=6 https://grafana.pilot.2i2c.cloud/d/MMHgC_Qnz/cluster-information?orgId=1&var-PROMETHEUS_DS=bf57840d-2ffb-45e4-bed2-3679c1ea2cdf&from=now-6M&to=now&viewPanel=6
This means that generally, every time a user logs in, they need to wait for the spawning of a new node.
The n2-highmem-4
instance has 4 vCPU and 32G of RAM. If there are one or two users each using 1GB and 1/8 vCPU, we are "wasting" (and paying for) the other "~30GB" of the node.
Therefore, I believe the n2-highmem-2
instance (2vCPU, 16GB) in sufficient for the user-node-pool
.
I'm considering implement these options:
Node type | Max users on single node | CPU display | CPU guarantee | CPU limit | RAM guarantee | RAM limit |
---|---|---|---|---|---|---|
n2-highmem-2 | 32 | 1/16 | 0.05625 | 1 | 0.5G | 1G |
n2-highmem-2 | 16 | 1/8 | 0.1125 | 1 | 1G | 2G |
n2-highmem-2 | 8 | 1/4 | 0.225 | 2 | 2G | 4G |
n2-highmem-2 | 4 | 1/2 | 0.45 | 2 | 4G | 8G |
n2-highmem-2 | 2 | 1 | 0.9 | 2 | 8G | 16G |
The AWS equivalent would be
r5.large
That said, I also considered reusing the support node, as it is already running all the time, so users don't need to wait for the node to start up. In terms of costs, I guess it will be very similar because the user nodes do not remain on for long periods.
I'm wondering how much extra complexity is added if the node pools are unified. For now, I believe we can discard this option.
Related issues:
That's a great insight @AIDEA775 ! These Catalyst hubs are not on separate clusters and are all being paid from the same cloud billing account. While each individual hub may have relatively low usage in terms of number of users, there is no reason to have separate node-pools for different hubs. (This was probably was already obvious to @AIDEA775 but a very clarifying perspective change to me in terms of 'who' need to have already started a Jupyter server session so that the second-user experienced a quick start up time)
I will defer to @AIDEA775 on whether we use n2-highmem-2
or n2-highmem-4
machines. That information should not be visible to the user and we can adjust based on demand at a future time.
Wieee great work on this @AIDEA775!!!
Thanks for you comments @consideRatio!
AWS pod limits On use of node selector to pick instance type
Thanks! I don't know about that!
Single node pool strategy To use a n2-highmem-2 node pool, declaring minsize to 1.
If I understand correctly, this means maintaining one node running on the user node pool all the time, right? I don't know if it's worth it; the community will pay for two instances which will idle most of the time.
Also regarding n2-highmem-2
vs n2-highmem-4
, I'm exploring the costs in GCP, and the user nodepool costs are relatively low compared to the core nodepool costs, as it is used sporadically. I think it's not worth optimizing.
I've been investigating some other options/strategies for reduce startup times:
Temporary session: The idea is to start a small pod upon user request for fast startup, allowing the user to begin working while a new node boots up. Once the new node is ready, perform a blue-green deploy to transition the user to the permanent pod. This approach may be overly complex.
Cache images in the core node: This strategy reduces both time and costs as there is no need to download images from outside the cluster. There are some tools which do this.
Warm pool: I think this is the most viable option. At least in AWS, we can maintain a warm node pool with a size of 1. This node remains stopped, incurring no costs, until needed. Once it's no longer required, a reuse policy can stop the instance and return it to the warm pool, hopefully retaining the cached images.
Docs: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-warm-pools.html Launched in 2021: https://aws.amazon.com/es/blogs/compute/scaling-your-applications-faster-with-ec2-auto-scaling-warm-pools/
I will read more about the 6th option and investigate if GCP has an equivalent.
I have been testing values for memory and CPU to put 4 user in a n2-highmem-4
node. I realized this:
If I sum the pod requests, the total is 437mCPU, whereas the console indicates that 433mCPU was requested.
Fixing the values:
Max users on n2-highmem-4 | CPU display | CPU guarantee | CPU limit | RAM guarantee | RAM limit |
---|---|---|---|---|---|
64 | 1/16 | 0.054 | 1 | 0.4258G | 1G |
32 | 1/8 | 0.108 | 1 | 0.8516G | 2G |
16 | 1/4 | 0.217 | 2 | 1.7033G | 4G |
8 | 1/2 | 0.435 | 2 | 3.4067G | 8G |
4 | 1 | 0.870 | 2 | 6.8134G | 16G |
The memory was calculated as:
RAM guarantee = (allocatable RAM
- overhead RAM
) / max users por node
where:
CPU guarantee = (allocatable CPU
- overhead CPU
) / max users por node
where:
allocatable CPU
is 3.92, same as RAM sourceoverhead CPU
is 0.437: The sum of requested pods displayed in Google Cloud Console, without Jupyter pods.The CPU guarantee needs to be truncated to 3 decimals; otherwise, the numbers will be rounded.
Now the users fit correctly on one node.
Okey no, this one:
To simplify numbers to users, we can "reduce" the "total" RAM to 24GB and 3 vCPU. So the numbers using a n2-highmem-4
node will be like:
Max users | Option display | CPU guarantee | CPU limit | RAM guarantee | RAM limit |
---|---|---|---|---|---|
78 | 375 MB RAM, ~1/20 CPU | 0.046 (3/64) | 1 | 366210K (375MB) | 1G |
39 | 750 MB RAM, ~1/10 CPU | 0.093 (3/32) | 1 | 732421K (750MB) | 2G |
19 | 1.5 GB RAM, ~1/5 CPU | 0.187 (3/16) | 2 | 1464843.75K (1.5GB) | 3G |
9 | 3 GB RAM, ~1/2 CPU | 0.375 (3/8) | 2 | 2929687.5K (3GB) | 6G |
4 | 6 GB RAM, 3/4 CPU | 0.750 (3/4) | 2 | 5859375K (6GB) | 12G |
2 | 12 GB RAM, 1.5 CPU | 1.500 (3/2) | 3 | 11718750K (12GB) | 24G |
We don't lie with the RAM reserved, but we do with the CPU, because 3/32 is an odd number not really meaningful to the user, but 1/10 can be a useful approximation.
Then, some cool things happen: we can fit 4 users with 6GB and will have 5.3GB and 0.49 vCPU free, in which we can also fit:
I think this is the best trade-off between simple numbers and maximum utilization of one node. What do you think @2i2c-org/engineering?
This is great work, @AIDEA775. In particular, I appreciate you looking at usage data to pick the size of the nodes!
I would like you to explore using the script in https://github.com/2i2c-org/infrastructure/tree/master/deployer/commands/generate/resource_allocation to pick the memory and CPU numbers once you have the instance type determined. It should generate appropriate values for you to use, taking into account some overhead + what daemonsets are being used. The script itself probably needs adjustments, but I think systematizing it is very important so we can have a standard set of resource allocations to provide people that 'fit' users appropriately on nodes.
We don't lie with the RAM reserved, but we do with the CPU, because 3/32 is an odd number not really meaningful to the user, but 1/10 can be a useful approximation.
I'd actually like us to just state the numbers directly as it's otherwise extremely hard to meaningfully reason about what numbers say and what they mean. The actual values we set are also shown in the jupyterlab interface. In general I think it is more confusing for users to see differing numbers in the profile list and in the jupyterlab interface than to see some non-rounded numbers. There's more rationale for this in https://github.com/2i2c-org/infrastructure/issues/3584.
This is a fantastic discussion. I don't think the choices of profiles needs to be perfect -- I'd rather roll them out when they are good enough and @2i2c-org/engineering approves and change things later if needed.
In terms of implementation, should a common profiles be set up in the common.values.yaml
files in both the catalystproject.africa
and the catalystproject.latam
clusters? That should clean up the each of the individual hub config files.
I'd rather roll them out when they are good enough and @2i2c-org/engineering approves and change things later if needed.
This is perfectly fine with me too!
@jmunroe, @AIDEA775 @yuvipanda and @consideRatio circling back on this.
Are the specs below the normalized offerings needed to create hubs for the Catalyst communities?
Currently we have these 4 catalyst hub requests:
Spec #1 has evolved considerably based on the conversation that has occurred in this issue. I think @AIDEA775 should prepare the formal specification of what is being proposed. Perhaps this can be done most precisely by creating a PR to change the configuration staging.latam and staging.af . I think that for the Catalyst Hubs a small, medium, and large size are a sufficient number of choices. (Only use every second row in the tables given above)
The images of Jupyter-Scipy, Rocker Geospatial, and unlisted_choice can also be set up as a PR to the configuration.
The landing page is NOT something that needs to be considered by the engineering team at this time. Yes, I think we need to eventually having a non-english landing page but that is not something that I would want to block before getting this hubs deployed. (@jnywong -- I think documenting how to modify the landing page and resolving how to handle non-english version is something we will need to address but is separate from this issue)
GitHub authentication is a good since we already have that set up. For our other hubs, we have the concept of 'hub champion' that needs to sign off or provide input as a hub is being deployed -- I think that is blocking step for getting these hubs set up. My big assumption here is that adding a new 'hub admin' user is something that can be done after the hub is deployed. (In the absence of a 'hub champion' and their GitHub id, management of these Catalyst Hubs should be done by the @2i2c-org/partnerships-and-community-guidance , at least in the very short term)
@jnywong is actively developing training materials that will target this default configuration for Catalyst Hubs.
Currently we have these 4 catalyst hub requests:
Just a quick clarification, those 4 issues are not "real" hub requests until a new hub request GH issue is created on this very same repo (unless we change the DoR we agreed on in the past and failed to properly enforce). Actually, we have just one "real" request here: https://github.com/2i2c-org/infrastructure/issues/3740.
For completeness, here are three Catalyst Project hubs that need to be deployed pending finalization of this issue:
Western Cape https://github.com/czi-catalystproject/Project-Board/issues/132 is not an active hub request at this time.
In AWS, using an r5.xlarge
instance and the proposed profileList
in this comment, we can fit not 4 but 5 users with 6GiB of RAM, utilizing 100% of vCPUs. This is because AWS nodes have slightly more allocatable RAM compared to GCP. Only 1.73GiB is wasted (6% of the node).
Since I'm using an imaginary machine with 24GiB and 3vCPU, I can simply copy-paste the options because they don't depend on the real node capacity.
I think that for the Catalyst Hubs a small, medium, and large size are a sufficient number of choices.
oki!
We don't lie with the RAM reserved, but we do with the CPU, because 3/32 is an odd number not really meaningful to the user, but 1/10 can be a useful approximation.
I'd actually like us to just state the numbers directly as it's otherwise extremely hard to meaningfully reason about what numbers say and what they mean. The actual values we set are also shown in the jupyterlab interface. In general I think it is more confusing for users to see differing numbers in the profile list and in the jupyterlab interface than to see some non-rounded numbers. There's more rationale for this in #3584.
Oh, you're displaying the memory limit! And setting the limit equal to the guarantee, I think it is okay. The jupyter/scipy-notebook
image doesn't have the memory monitor in the status bar, but it makes sense that if the user selects "3GB RAM", they can only use up to 3GB.
I'm also considering simply not displaying the CPU in the options, only the RAM. Will it be too simple?
A really nice discussion so far! It seems there is a consensus about a formal specification for the Catalyst hubs so I would like to second what @jmunroe said in the linked comment as a way to formalize a DoD for this issue: https://github.com/2i2c-org/infrastructure/issues/3631#issuecomment-1979514766. Particularly these below pieces:
I think @AIDEA775 should prepare the formal specification of what is being proposed. Perhaps this can be done most precisely by creating a PR to change the configuration staging.latam and staging.af . I think that for the Catalyst Hubs a small, medium, and large size are a sufficient number of choices.
The images of Jupyter-Scipy, Rocker Geospatial, and unlisted_choice can also be set up as a PR to the configuration.
The landing page is NOT something that needs to be considered by the engineering team at this time.
In the absence of a 'hub champion' and their GitHub id, management of these Catalyst Hubs should be done by the @2i2c-org/partnerships-and-community-guidance , at least in the very short term
@haroldcampbell I have added this issue to the new Eng board alongside the newly created hub requests listed in https://github.com/2i2c-org/infrastructure/issues/3631#issuecomment-1979949331.
Cool. Thanks @damianavila. This issue has taken us 33 working days (Jan 22 - Mar 7) since it was initially raised by @jmunroe.
I'm am feeling that there is value in doing post-mortem as I suspect that this could have been handled more effectively. @yuvipanda, @AIDEA775, @consideRatio and @jmunroe, do we need to do a post-mortem?
I'm keen to facilitate this next week if we believe there is value in doing a post-mortem.
Adding [Catalyst-Africa] New lot of hubs (kush, wits and molerhealth) #3808 for visibility.
The latest referenced hubs were deployed and we even had a retrospective about this whole process, so I am tempted to close this one now. Feel free to re-open if you disagree with me.
We are making progress deploying Catalyst Project hubs. Here's what we have to deployed to date:
Catalyst Project, LatAm(GCP: southamerica-east1)
Catalyst Project, Africa (AWS: af-south-1)
I like to normalize what we are offering to Catalyst communities (until a particular community has different needs).
Our Collaborative Lesson Training Development can then make the assumption that this is the configuration of hub a community is starting from. Potential items to train community champions on: