Closed yuvipanda closed 5 months ago
@consideRatio I believe this is now complete and you should be able to set up the hub now. I notice in https://github.com/2i2c-org/infrastructure/pull/3854 it's set up as a daskhub - note that it should instead be set up as a base hub.
As you go through these, if you find you're having to make choices that make use of information not present in this issue, please point it out so I can make sure to incorporate that into the process.
Thanks.
Due to the fact that they want more CPUs in their GPU nodes, we need to set up g4dn.2xlarge
nodes as well in eksctl.
As you go through these, if you find you're having to make choices that make use of information not present in this issue, please point it out so I can make sure to incorporate that into the process.
Hub 2: LINC hub
The same as staging, just different name (linc).
This made me think that you requested that instead of naming the hub prod
in our config, we may name it linc
, which raised questions like: do we let the domain name be linc.2i2c.cloud
or linc.linc.2i2c.cloud
?
Looking at how you set things up for bican, I'm assuming the config name should be prod
, and the domain name should be linc.2i2c.cloud
without staging, allowing for specialized hubs to be named <something>.linc.2i2c.cloud
.
@yuvipanda when filling in funded_by, I don't know what to write. On the lincbrain website I see this, but it doesn't mean this 2i2c hub should be considered funded by them also.
For now, leaving it blank:
funded_by:
name: ""
url: ""
Phase 3.2: Authentication
Question Answer Authentication Mechanism GitHub (via GitHubOAuthenticator) Org based access? No Admin Users
@kabilar, @aaronkanzer, @asmacdo, @satra
I'll set this up to provide only the admin users access for now, not enabling allow_all: true
or similar.
staging and prod cluster's display_name:
from cluster.yaml is not explicitly specified
We got three different choices on combining LINC / DANDI / BICAN with MIT, and with (prod) or no (prod).
LINC (staging)
and LINC
MIT DANDI (staging)
and MIT DANDI
BICAN (staging)
and BICAN (prod)
I obseved also a discrepancy on how we configure jupyterhub.custom.homepage.templateVars
. Once in common without adjustment in staging/prod, or adjustment in staging/prod for the org.name
. I'll go with the one without customization for this dedicated hub, as staging remains in the domain name and that may be sufficient distinction.
Default to /lab
and allowedNamedServers: true
was assumed to be wanted based on config from dandi/bican, but not explicit in the specification.
I tested the biggest resource allocation option for all machine types, and only the GPU options spawned - not the others. I'll look into fixing it for dandi/bican/linc.
EDIT: Fixed fix bican/dandi/linc in PR
Starting up a GPU server (don't remember what image) took sometime between 9-10 minutes, and the startup timeout is 10 minutes. I've increased the timeout to 15 minutes to provide some margin of error for bican/dandi/linc for now.
EDIT: Fixed in PR
Thanks for the feedback, @consideRatio. I'll incorporate them into the process.
I think this was completed and then we decomissioned it also - closing.
Copied over from https://github.com/2i2c-org/meta/issues/913
Process Note
I'm using this as a way to try to rejig our new hub request process. See https://github.com/2i2c-org/meta/issues/897 (particularly https://github.com/2i2c-org/meta/issues/897#issuecomment-2010984904) for more information.
https://miro.com/app/board/uXjVNjUP3iQ=/, describes the various 'phases' of new hub turn-up. Each phase will be marked as "READY" or "NOT READY" when all information needed for it is available. Each section should also link to an appropriate runbook.
There will be customizations after this is all set up, but this is pathway towards a standardized hub turn up.
Phase 1: Account setup (READY)
This is applicable for cases where this is a dedicated cluster. The following table lists the information before this phase can start.
linc
Appropriate runbook: https://infrastructure.2i2c.org/hub-deployment-guide/cloud-accounts/new-aws-account/
Phase 2: Cluster setup (READY)
This assumes all engineers have access to this new account, and will be able to set up the cluster + support, without any new hubs being set up.
Appropriate runbooks:
Phase 3 : Hub setup (READY)
There's going to be a number of hubs, and this starts specifying them.
Hub 1: Staging
Phase 3.1: Initial setup
Phase 3.2: Authentication
@kabilar, @aaronkanzer, @asmacdo, @satra
Phase 3.3: Object storage access
Phase 3.4: Profile List
This was derived from looking at https://github.com/dandi/dandi-hub/blob/dandi/config.yaml.j2#L138-L210 and adopting to match our standards.
Environments
Resource Allocations
CPU
Generated by
deployer generate resource-allocation choices r5.xlarge --num-allocations 4
GPU
Manually set up, but should be autogenerated
Hub 2: LINC hub
The same as staging, just different name (linc).