2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
103 stars 63 forks source link

[Epic] Standardize Catalyst Project Hubs #3631

Closed jmunroe closed 6 months ago

jmunroe commented 8 months ago

We are making progress deploying Catalyst Project hubs. Here's what we have to deployed to date:

Catalyst Project, LatAm(GCP: southamerica-east1)

Catalyst Project, Africa (AWS: af-south-1)

I like to normalize what we are offering to Catalyst communities (until a particular community has different needs).

  1. Standardize on a ~8 GB RAM, ~1.0 CPU machine type per user with four users / node (using "small" machines).
  2. Make both a community maintained RStudio and an JupyterLab environment available.
  3. Include a unlisted_choice option so communities can test deploying their own images.
  4. Deploy a landing page (in English, Spanish, or Portuguese) that references back to the Catalyst Project website for additional information
  5. Use GitHub authentication (and team membership in the Catalyst-Hubs GitHub organization) as the default unless a community has elected to use a Google Suite style authentication
  6. Use staging.latam and staging.af as 'reference' deployments and GitHub authentication through Catalyst-Hubs to allow investigation of the platform by potential new community champions.

Our Collaborative Lesson Training Development can then make the assumption that this is the configuration of hub a community is starting from. Potential items to train community champions on:

  1. User intro to a JupyterHub. The difference between RStudio or JupyterLab 'images/environments' and the overall JupyterHub platform. Cloud computing concepts like machine type, node, autoscaling. When a hub is automatically shut down. Memory limits. Example cloud workflow demonstration.
  2. Admin intro a JupyterHub. Explain authentication via Google Suite or via GitHub Teams. Activity: add a new user to a GitHub team. Using the JupyterHub admin page to see activity of other users. Explain running multiple servers for one user. Explain memory/cpu limits and the options available for larger machine types.
  3. Image management. Some combination of repo2docker, binder, Docker, and configuring both Python and R environments. Give example of QGIS as example that hubs can also support full desktop applications as well.
  4. Cloud computing and cost management. Using Grafana to measure usage. Options for shared storage in cloud computing.
consideRatio commented 8 months ago

@jmunroe I greatly appreciate that level of detail in this issue, nice!!

  • Standardize on a ~8 GB RAM, ~1.0 CPU machine type per user with four users / node (using "small" machines).

I consider this to be too conservative on behalf of the user requirements, and that it would be better to start with a smaller requested amount of memory/cpu in order to have lower cloud costs and startup times (note that a request is a k8s term for guaranteed amount when scheduling user containers to run on nodes).

I think concretely, I'd suggest a 0.5GB/1GB request/limit by default and adding a 1/2 GB, 2/4 GB, and a 4/8 request/limit option that could be chosen. Like this we could fit 64 or 32 users per small node by default instead with the 1/2 GB request/limit choice. For comparison, the utoronto hub provides all users with 1/2 GB request/limit, but on average their usage is around 250-400 MB per user and the nodes CPU utilization is commonly around 10%.

If we would go for only a single resource allocation option, and providing a high amount of memory per user, its a breaking change to lower it later, but its not a breaking change to increase it later. In the long run for a large set of users with some groups/classes needing more than others, its also cost inefficient to only provide one option that meets the needs of the most memory needing user. When that happens if there is only one option available, its some complexity to introduce more options later.

EDIT: more concrete proposal on values can be found at https://github.com/2i2c-org/infrastructure/pull/3629#discussion_r1463221224

jmunroe commented 8 months ago

Thanks @consideRatio . These distinctions about requests/limits are something that will need to include in our community hub champion training. I suspect that @jnywong will becoming to you or others in @2i2c-org/engineering to ask more questions!

I suppose I'm surprised that typical RAM usage is typically only 250-400MB per user. Does that pattern hold for research hubs as well?

consideRatio commented 8 months ago

I suppose I'm surprised that typical RAM usage is typically only 250-400MB per user. Does that pattern hold for research hubs as well?

Its super hard to say, but until users starts processing data, the memory use is low - but when they do process data it can be anything really, and it can be RAM used temporarily or for longer durations.

I'd like to see communities default to low requests of memory, get help to understand when they run into limits and how it can look, and finally be able to relatively easily increase it when needed.

damianavila commented 8 months ago

Today, I had a short conversation with @AIDEA775 about some pieces of information from the Latam community usage that he wants to bring into the conversation (and potentially get more info), so I added this one to the upcoming sprint to foster the discussion and assign it to @AIDEA775 alongside the other already discussing it.

AIDEA775 commented 7 months ago

I mostly agree with the @consideRatio proposal:

My proposal is to use this formula and provide options requesting 0.5, 1, 2, and 4 GB of memory, representing ~1/64, ~1/32, ~1/16, and ~1/8 of the node.

  • mem request 0.5G
  • mem limit 1G (twice the requested memory)
  • cpu request 3.6 / 64 (~share of allocatable CPU)
  • cpu limit 4 / 64 * 8 (something like eight times the requested CPU excluding the ~400m headroom, but at least 1CPU)

I also consider that there are few users (almost for now) simultaneously using the Catalyst clusters. In the last months, grafana reports only 1-2 concurrent users, with only one peek of 14 users on one day (in the latam cluster).

https://grafana.pilot.2i2c.cloud/d/hub-dashboard/jupyterhub-dashboard?orgId=1&var-PROMETHEUS_DS=b75a13ba-abf3-442f-8b04-00824593c07c&var-hub=All&from=now-6M&to=now&viewPanel=3 https://grafana.pilot.2i2c.cloud/d/hub-dashboard/jupyterhub-dashboard?orgId=1&var-PROMETHEUS_DS=bf57840d-2ffb-45e4-bed2-3679c1ea2cdf&var-hub=All&from=now-6M&to=now&viewPanel=3

There are few usages of nodes in the user-node-pool:

https://grafana.pilot.2i2c.cloud/d/MMHgC_Qnz/cluster-information?orgId=1&var-PROMETHEUS_DS=b75a13ba-abf3-442f-8b04-00824593c07c&from=now-6M&to=now&viewPanel=6 https://grafana.pilot.2i2c.cloud/d/MMHgC_Qnz/cluster-information?orgId=1&var-PROMETHEUS_DS=bf57840d-2ffb-45e4-bed2-3679c1ea2cdf&from=now-6M&to=now&viewPanel=6

This means that generally, every time a user logs in, they need to wait for the spawning of a new node.


The n2-highmem-4 instance has 4 vCPU and 32G of RAM. If there are one or two users each using 1GB and 1/8 vCPU, we are "wasting" (and paying for) the other "~30GB" of the node.

Therefore, I believe the n2-highmem-2 instance (2vCPU, 16GB) in sufficient for the user-node-pool.

I'm considering implement these options:

Node type Max users on single node CPU display CPU guarantee CPU limit RAM guarantee RAM limit
n2-highmem-2 32 1/16 0.05625 1 0.5G 1G
n2-highmem-2 16 1/8 0.1125 1 1G 2G
n2-highmem-2 8 1/4 0.225 2 2G 4G
n2-highmem-2 4 1/2 0.45 2 4G 8G
n2-highmem-2 2 1 0.9 2 8G 16G

The AWS equivalent would be r5.large


That said, I also considered reusing the support node, as it is already running all the time, so users don't need to wait for the node to start up. In terms of costs, I guess it will be very similar because the user nodes do not remain on for long periods.

I'm wondering how much extra complexity is added if the node pools are unified. For now, I believe we can discard this option.


Related issues:

https://github.com/2i2c-org/infrastructure/issues/3132

https://github.com/2i2c-org/infrastructure/issues/3584

jmunroe commented 7 months ago

That's a great insight @AIDEA775 ! These Catalyst hubs are not on separate clusters and are all being paid from the same cloud billing account. While each individual hub may have relatively low usage in terms of number of users, there is no reason to have separate node-pools for different hubs. (This was probably was already obvious to @AIDEA775 but a very clarifying perspective change to me in terms of 'who' need to have already started a Jupyter server session so that the second-user experienced a quick start up time)

jmunroe commented 7 months ago

I will defer to @AIDEA775 on whether we use n2-highmem-2 or n2-highmem-4 machines. That information should not be visible to the user and we can adjust based on demand at a future time.

consideRatio commented 7 months ago

Wieee great work on this @AIDEA775!!!

On putting user pods on core nodes

> I also considered reusing the support node, as it is already running all the time, so users don't need to wait for the node to start up. In terms of costs, I guess it will be very similar because the user nodes do not remain on for long periods. It could reduce cloud cost and also avoid startup times for the initial set of users. We refer to core nodes and user nodes, where the core nodes run everything but user workloads pretty much. We've opted to never mix the user workloads and the core workloads historically, and I think its a good call to stay with that decision consistently across all 2i2c clusters to avoid introducing complexity that could incur operational costs and security/reliability challenges. If we would optimize for cloud costs all the way though, we wouldn't though.

On resource allocation choices

Excellent! I think this will be great - thank you for providign these very useful columns of information to look at btw! >| Node type | Max users on single node | CPU display | CPU guarantee | CPU limit | RAM guarantee | RAM limit | >|---|---- | ----|---| --- | --- | --- | >| n2-highmem-2 | 32 | 1/16 | 0.05625 | 1 | 0.5G | 1G | >| n2-highmem-2 | 16 | 1/8 | 0.1125| 1 | 1G | 2G | >| n2-highmem-2 | 8 | 1/4 | 0.225 | 2 | 2G | 4G | >| n2-highmem-2 | 4 | 1/2 | 0.45 | 2 |4G | 8G | >| n2-highmem-2 | 2 | 1 | 0.9 | 2 | 8G | 16G | Use of `n2-highmem-2` instead of `n2-highmem-4` is reasonable I think. I think also for AWS where the equivalent nodes are `r5.large` and `r5.xlarge`. **AWS pod limits** On AWS, small nodes may [run into pod limits](https://github.com/awslabs/amazon-eks-ami/blob/master/files/eni-max-pods.txt) as well though, but I think its acceptable still. You can only schedule 29 pods on a `r5.large`, and 59 pods on `r5.xlarge`. Subtracting 5-6 system pods, we get ~24 users able to schedule still on the small `r5.large` node which is fine - it may even be fully consumed as long as not all users only needed 0.5 GB. With these CPU limits that never goes above 2 CPU, its even less weird if we would schedule users on a mix of `n2-highmem-2` and `n2-highmem-4` nodes as users would never expect being able to use 4 CPU, instead always expecting at max 2 CPU consistently. ### Issue - Fewer max users The machines doesn't provide an exact amount of memory, and then after k8s has reserved some capacity, what remains is allocatable capacity of CPU/memory. We can't request more than this. The allocatable node capacities observed in our k8s clusters are: https://github.com/2i2c-org/infrastructure/blob/ef948a93374a64ab5e08d8a4947617741fc8ac74/deployer/commands/generate/resource_allocation/instance_capacities.yaml#L19-L27 Then we must add some margin for system pods running on each user node as well, because they will also make some requests. Those requests can be overviewed here: https://github.com/2i2c-org/infrastructure/blob/ef948a93374a64ab5e08d8a4947617741fc8ac74/deployer/commands/generate/resource_allocation/daemonset_requests.yaml#L47-L52 The remaining capacity we can allocate user pods on is the capacity that should be divided up in order to fit one part per user. A hurdle is that for a 2 CPU / 16GB node, the overhead is a bit larger relatively than for a 4 CPU / 32 GB node.

On startup times

> This means that generally, every time a user logs in, they need to wait for the spawning of a new node. I appreciate that you observed this, its something that has bugged me as well for low activity clusters. And if that is the first experience users get, it doesn't encourage usage of the cluster either. I have opened https://github.com/2i2c-org/infrastructure/issues/3260 about this. I think this is worth acting on, especially with the resource allocation options provided above where a quite narrow range of resource allocation options are provided. I see two strategies to ensure a user node is started: 1. **Single node pool strategy** To use a `n2-highmem-2` node pool, declaring minsize to 1. 2. **Mixed node pool strategy** 3. To use a `n2-highmem-2` node pool and a `n2-highmem-4` node pool, declaring a fixed size to 1 on the 2 CPU node pool and a dynamic size on the 4 CPU node pool. Both options makes startup time fast for initial users. The second option is more suitable if we have many users using the higher memory options where only 2-4 users for example would fit on a node. Then every 2-4 user would need to wait for a node startup, so use of 4 CPU nodes would make that become "every 4-8 user would need to wait for a node startup" instead, which is an improvement I think. Note that the 2 CPU nodes would be filled up first thanks to a pod scheduling logic that fills up the node that is already the fullest. I think either strategy 1 or strategy 2 is best, leaning towards thinking that 1 is better if on average maybe eight or more users fit on the 2 CPU nodes.

On use of node selector to pick instance type

The user nodes aren't able to schedule on the core nodes, this comes from basehub's config of z2jh. User pods require being put on nodes with `hub.jupyter.org/node-purpose=user` labels, and the `hub` pod etc are required to be on nodes with `...=core` labels. https://github.com/2i2c-org/infrastructure/blob/ef948a93374a64ab5e08d8a4947617741fc8ac74/helm-charts/basehub/values.yaml#L128-L142 Due to this, we wouldn't need to use an more explicit configuration of what instance type to schedule on. By omitting `node_selector`, we could thereby allow user pods to schedule on user nodes of either the 2 CPU or 4 CPU user nodes. This would be relevant if we went for a mixed node pool strategy as discussed regarding startup times.
AIDEA775 commented 7 months ago

Thanks for you comments @consideRatio!

AWS pod limits On use of node selector to pick instance type

Thanks! I don't know about that!

Single node pool strategy To use a n2-highmem-2 node pool, declaring minsize to 1.

If I understand correctly, this means maintaining one node running on the user node pool all the time, right? I don't know if it's worth it; the community will pay for two instances which will idle most of the time.


Also regarding n2-highmem-2 vs n2-highmem-4, I'm exploring the costs in GCP, and the user nodepool costs are relatively low compared to the core nodepool costs, as it is used sporadically. I think it's not worth optimizing.


I've been investigating some other options/strategies for reduce startup times:

  1. Temporary session: The idea is to start a small pod upon user request for fast startup, allowing the user to begin working while a new node boots up. Once the new node is ready, perform a blue-green deploy to transition the user to the permanent pod. This approach may be overly complex.

  2. Cache images in the core node: This strategy reduces both time and costs as there is no need to download images from outside the cluster. There are some tools which do this.

  3. Warm pool: I think this is the most viable option. At least in AWS, we can maintain a warm node pool with a size of 1. This node remains stopped, incurring no costs, until needed. Once it's no longer required, a reuse policy can stop the instance and return it to the warm pool, hopefully retaining the cached images.

    Docs: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-warm-pools.html Launched in 2021: https://aws.amazon.com/es/blogs/compute/scaling-your-applications-faster-with-ec2-auto-scaling-warm-pools/

I will read more about the 6th option and investigate if GCP has an equivalent.

AIDEA775 commented 7 months ago

I have been testing values for memory and CPU to put 4 user in a n2-highmem-4 node. I realized this:

If I sum the pod requests, the total is 437mCPU, whereas the console indicates that 433mCPU was requested.

image

AIDEA775 commented 7 months ago

Fixing the values:

Max users on n2-highmem-4 CPU display CPU guarantee CPU limit RAM guarantee RAM limit
64 1/16 0.054 1 0.4258G 1G
32 1/8 0.108 1 0.8516G 2G
16 1/4 0.217 2 1.7033G 4G
8 1/2 0.435 2 3.4067G 8G
4 1 0.870 2 6.8134G 16G

The memory was calculated as:

RAM guarantee = (allocatable RAM - overhead RAM) / max users por node where:

CPU guarantee = (allocatable CPU - overhead CPU) / max users por node where:

The CPU guarantee needs to be truncated to 3 decimals; otherwise, the numbers will be rounded.

Now the users fit correctly on one node.

AIDEA775 commented 7 months ago

Okey no, this one:

To simplify numbers to users, we can "reduce" the "total" RAM to 24GB and 3 vCPU. So the numbers using a n2-highmem-4 node will be like:

Max users Option display CPU guarantee CPU limit RAM guarantee RAM limit
78 375 MB RAM, ~1/20 CPU 0.046 (3/64) 1 366210K (375MB) 1G
39 750 MB RAM, ~1/10 CPU 0.093 (3/32) 1 732421K (750MB) 2G
19 1.5 GB RAM, ~1/5 CPU 0.187 (3/16) 2 1464843.75K (1.5GB) 3G
9 3 GB RAM, ~1/2 CPU 0.375 (3/8) 2 2929687.5K (3GB) 6G
4 6 GB RAM, 3/4 CPU 0.750 (3/4) 2 5859375K (6GB) 12G
2 12 GB RAM, 1.5 CPU 1.500 (3/2) 3 11718750K (12GB) 24G

We don't lie with the RAM reserved, but we do with the CPU, because 3/32 is an odd number not really meaningful to the user, but 1/10 can be a useful approximation.

Then, some cool things happen: we can fit 4 users with 6GB and will have 5.3GB and 0.49 vCPU free, in which we can also fit:

I think this is the best trade-off between simple numbers and maximum utilization of one node. What do you think @2i2c-org/engineering?

yuvipanda commented 7 months ago

This is great work, @AIDEA775. In particular, I appreciate you looking at usage data to pick the size of the nodes!

I would like you to explore using the script in https://github.com/2i2c-org/infrastructure/tree/master/deployer/commands/generate/resource_allocation to pick the memory and CPU numbers once you have the instance type determined. It should generate appropriate values for you to use, taking into account some overhead + what daemonsets are being used. The script itself probably needs adjustments, but I think systematizing it is very important so we can have a standard set of resource allocations to provide people that 'fit' users appropriately on nodes.

yuvipanda commented 7 months ago

We don't lie with the RAM reserved, but we do with the CPU, because 3/32 is an odd number not really meaningful to the user, but 1/10 can be a useful approximation.

I'd actually like us to just state the numbers directly as it's otherwise extremely hard to meaningfully reason about what numbers say and what they mean. The actual values we set are also shown in the jupyterlab interface. In general I think it is more confusing for users to see differing numbers in the profile list and in the jupyterlab interface than to see some non-rounded numbers. There's more rationale for this in https://github.com/2i2c-org/infrastructure/issues/3584.

jmunroe commented 7 months ago

This is a fantastic discussion. I don't think the choices of profiles needs to be perfect -- I'd rather roll them out when they are good enough and @2i2c-org/engineering approves and change things later if needed.

In terms of implementation, should a common profiles be set up in the common.values.yaml files in both the catalystproject.africa and the catalystproject.latam clusters? That should clean up the each of the individual hub config files.

yuvipanda commented 7 months ago

I'd rather roll them out when they are good enough and @2i2c-org/engineering approves and change things later if needed.

This is perfectly fine with me too!

haroldcampbell commented 6 months ago

@jmunroe, @AIDEA775 @yuvipanda and @consideRatio circling back on this.

Are the specs below the normalized offerings needed to create hubs for the Catalyst communities?

  1. Standardize on a 0.5GB RAM, ~1.0 CPU machine type per user with four users / node (using "small" machines).
  2. Make both a community maintained RStudio and an JupyterLab environment available.
  3. Include a unlisted_choice option so communities can test deploying their own images.
  4. Deploy a landing page (in English, Spanish, or Portuguese) that references back to the Catalyst Project website for additional information (Is there an expectation that the engineering team will deliver item #4?)
  5. Use GitHub authentication (and team membership in the Catalyst-Hubs GitHub organization) as the default unless a community has elected to use a Google Suite style authentication
  6. Use staging.latam and staging.af as 'reference' deployments and GitHub authentication through Catalyst-Hubs to allow investigation of the platform by potential new community champions.

Currently we have these 4 catalyst hub requests:

jmunroe commented 6 months ago

Spec #1 has evolved considerably based on the conversation that has occurred in this issue. I think @AIDEA775 should prepare the formal specification of what is being proposed. Perhaps this can be done most precisely by creating a PR to change the configuration staging.latam and staging.af . I think that for the Catalyst Hubs a small, medium, and large size are a sufficient number of choices. (Only use every second row in the tables given above)

The images of Jupyter-Scipy, Rocker Geospatial, and unlisted_choice can also be set up as a PR to the configuration.

The landing page is NOT something that needs to be considered by the engineering team at this time. Yes, I think we need to eventually having a non-english landing page but that is not something that I would want to block before getting this hubs deployed. (@jnywong -- I think documenting how to modify the landing page and resolving how to handle non-english version is something we will need to address but is separate from this issue)

GitHub authentication is a good since we already have that set up. For our other hubs, we have the concept of 'hub champion' that needs to sign off or provide input as a hub is being deployed -- I think that is blocking step for getting these hubs set up. My big assumption here is that adding a new 'hub admin' user is something that can be done after the hub is deployed. (In the absence of a 'hub champion' and their GitHub id, management of these Catalyst Hubs should be done by the @2i2c-org/partnerships-and-community-guidance , at least in the very short term)

@jnywong is actively developing training materials that will target this default configuration for Catalyst Hubs.

damianavila commented 6 months ago

Currently we have these 4 catalyst hub requests:

Just a quick clarification, those 4 issues are not "real" hub requests until a new hub request GH issue is created on this very same repo (unless we change the DoR we agreed on in the past and failed to properly enforce). Actually, we have just one "real" request here: https://github.com/2i2c-org/infrastructure/issues/3740.

jmunroe commented 6 months ago

For completeness, here are three Catalyst Project hubs that need to be deployed pending finalization of this issue:

Western Cape https://github.com/czi-catalystproject/Project-Board/issues/132 is not an active hub request at this time.

AIDEA775 commented 6 months ago

In AWS, using an r5.xlarge instance and the proposed profileList in this comment, we can fit not 4 but 5 users with 6GiB of RAM, utilizing 100% of vCPUs. This is because AWS nodes have slightly more allocatable RAM compared to GCP. Only 1.73GiB is wasted (6% of the node).

Since I'm using an imaginary machine with 24GiB and 3vCPU, I can simply copy-paste the options because they don't depend on the real node capacity.

Screenshot from 2024-03-06 01-23-01


I think that for the Catalyst Hubs a small, medium, and large size are a sufficient number of choices.

oki!

We don't lie with the RAM reserved, but we do with the CPU, because 3/32 is an odd number not really meaningful to the user, but 1/10 can be a useful approximation.

I'd actually like us to just state the numbers directly as it's otherwise extremely hard to meaningfully reason about what numbers say and what they mean. The actual values we set are also shown in the jupyterlab interface. In general I think it is more confusing for users to see differing numbers in the profile list and in the jupyterlab interface than to see some non-rounded numbers. There's more rationale for this in #3584.

Oh, you're displaying the memory limit! And setting the limit equal to the guarantee, I think it is okay. The jupyter/scipy-notebook image doesn't have the memory monitor in the status bar, but it makes sense that if the user selects "3GB RAM", they can only use up to 3GB.

I'm also considering simply not displaying the CPU in the options, only the RAM. Will it be too simple?

damianavila commented 6 months ago

A really nice discussion so far! It seems there is a consensus about a formal specification for the Catalyst hubs so I would like to second what @jmunroe said in the linked comment as a way to formalize a DoD for this issue: https://github.com/2i2c-org/infrastructure/issues/3631#issuecomment-1979514766. Particularly these below pieces:

I think @AIDEA775 should prepare the formal specification of what is being proposed. Perhaps this can be done most precisely by creating a PR to change the configuration staging.latam and staging.af . I think that for the Catalyst Hubs a small, medium, and large size are a sufficient number of choices.

The images of Jupyter-Scipy, Rocker Geospatial, and unlisted_choice can also be set up as a PR to the configuration.

The landing page is NOT something that needs to be considered by the engineering team at this time.

In the absence of a 'hub champion' and their GitHub id, management of these Catalyst Hubs should be done by the @2i2c-org/partnerships-and-community-guidance , at least in the very short term

damianavila commented 6 months ago

@haroldcampbell I have added this issue to the new Eng board alongside the newly created hub requests listed in https://github.com/2i2c-org/infrastructure/issues/3631#issuecomment-1979949331.

haroldcampbell commented 6 months ago

Cool. Thanks @damianavila. This issue has taken us 33 working days (Jan 22 - Mar 7) since it was initially raised by @jmunroe.

I'm am feeling that there is value in doing post-mortem as I suspect that this could have been handled more effectively. @yuvipanda, @AIDEA775, @consideRatio and @jmunroe, do we need to do a post-mortem?

I'm keen to facilitate this next week if we believe there is value in doing a post-mortem.

haroldcampbell commented 6 months ago

Adding [Catalyst-Africa] New lot of hubs (kush, wits and molerhealth) #3808 for visibility.

damianavila commented 6 months ago

The latest referenced hubs were deployed and we even had a retrospective about this whole process, so I am tempted to close this one now. Feel free to re-open if you disagree with me.