Open consideRatio opened 1 year ago
# documented like this
python3 deployer generate-cluster <cluster-name> aws
# in practice done like this
pip install -e .
deployer generate-aws-cluster --cluster-name=ubc-eoas --hub-type=basehub --cluster-region=ca-central-1
When having created a .jsonnet file, the zones I got didn't match available availability zones. 1a, 1b, 1c was generated, but only 1a, 1b, 1d existed.
Document experiences in upgrading beyond 1.22. I wrote about this in https://2i2c.slack.com/archives/CKJS000F4/p1671374097438499.
jsonnet is a tool to install but I wasn't asked to install it at any point in time as a pre-requisite to deploying a new hub.
There’s no requirement to commit the *.eksctl.yaml file to the repository since we can regenerate it using the above jsonnet command.
We .gitignore them, so one cant.
Following a terraform step i got myself a .terraform.lock.hcl
file that I didn't understand what it was about in terraform/aws
.
We shouldn't version control the .terraform.lock.hcl
file, right? See Terraform's documentation about the Dependency Lock File.
Action: add to .gitignore
I've seen python deployer use-cluster-credentials
and python3 deployer
and just deployer
mentioned. We should stick with assuming it's already pip install -e .
I think
There is a link that is outdated and points to the wrong lines of relevance:
https://github.com/2i2c-org/infrastructure/tree/HEAD/.github/workflows/deploy-hubs.yaml#L31-L36
In practice, we seem to need to add things to the output list of the upgrade-support-and-staging
job in deploy-hubs.yaml and in the matrix for the deploy_grafana_dashboards
job in deploy-grafana-dashboards.yaml
Once the deploy chart was deployed and you were able to log into grafana as the admin user, you can generate an API key.
"deploy chart" should be "support chart"
the following sops-ecrypted file config/clusters/
/enc-grafana-token.secret.yaml, with a content similar to:
The subsequent indentation is off.
Typo in:
for a hub are simmilar with the ones for
Its unclear to me in CILogon authentication if I should use CILogon directly or involve auth0 somehow.
Its unclear to me how to configure shown_idps
for CILogon. It seems that the listed shown_idps are not always the EntityID name listed in https://cilogon.org/idplist/.
From discussion and trial, it seems that shown_idp should reference the EntityId as listed via https://cilogon.org/idplist/ and exactly that.
This page seems outdated, referencing https://2i2c.org/pilot and getting redirected to https://docs.2i2c.org/
logout button issue in default login page template
The central grafana URL is not clear where it is at this point.
specific GitHub organization you wish to allow login.
Followed by incorrect indentation
This section didn't provide a link to the list of github applications and had incorrect formatting.
Mention that deleting data can take a long time in https://infrastructure.2i2c.org/en/latest/hub-deployment-guide/hubs/other-hub-ops/delete-hub.html#delete-data.
Steps 3 and 4 can be actioned while this PR is reviewed and merged.
"After" merged
Typo at https://infrastructure.2i2c.org/en/latest/howto/grafana-github-auth.html (For example, ghttps://grafana.pilot.2i2c.cloud)
I had to create a directory for my new cluster in config/clusters
before letting terraform write a file to that location.
I saw this, but CLUSTER_NAME wasn't prefixed with $
deployer deploy-support CLUSTER_NAME
When configuring domain names via namecheap, it would be good to link directly to where this is done: https://ap.www.namecheap.com/Domains/DomainControlPanel/2i2c.cloud/advancedns instead of namecheap.com.
This command should be without create
as a standalone arg
deployer cilogon-client-create create 2i2c dask-staging daskhub dask-staging.2i2c.cloud
Update AWS account creation docs to suggest use of email sub-addressing, like support+aws-<account name>@2i2c.org
instead of creating new emails.
Related to https://github.com/2i2c-org/meta/issues/535
Verification that this generated .jsonnet logic is correct and makes sense.
This looks incorrect but I'm not sure. Or maybe this generates correctly even if basehub != daskhub, giving daskNodes a null value?
Most of the things, if not all, are resolved or tracked somewhere else. Closing now, re-open if you disagree.
The top post is updated as I've removed things from it as I've resolved them, I can have this closed or manage this privately as it doesn't really merit attention from others in a way though.
I'll open it for now.
What remains to be resolved
9
In this aws step about granting eksctl access to other users.
I wonder if we should even do that if we can use the deployer script to get credentials though, hmm. Also, I have now not created additional users for that account.
UPDATE: Yes, because we need such credentials when we do operations like adding/removing node pools as well. Why one may ask, but the answer is because for example
kubectl drain
is used which is a k8s api interaction. The action point is to make it clear why we add this permission.14
Its unclear to me in the Enable authentication section if we are supposed to add 2i2c members to be able to authenticate, and if so, with what identity provider (GitHub team? Google email accounts from 2i2c.org?).
23
I saw no mention of cleaning up scratch buckets, but I think we should consider that as well in the decomissioning process.
24
This section didn't link to how to create an incident response issue. I asked myself, where? In 2i2c-org/infrastructure?
https://team-compass.2i2c.org/en/latest/projects/managed-hubs/incidents.html#key-terms
25
I'm not sure if
/pd trigger
works, or in what channel, or similar. I never managed to see a popup like described in https://team-compass.2i2c.org/en/latest/projects/managed-hubs/incidents.html#incident-response-process.32
In this comment, in step 1, I ask the community reps to help authorize the github oauth application to receive organizational membership info from users instead of asking to become an owner and do it for them. With it, I provided a screenshot example.
https://github.com/2i2c-org/infrastructure/issues/2323#issuecomment-1505721800
33
Setting up a new GCP Project with the existing billing account should make it clear that only new billing accounts as compared to new gcp projects already linked to the 2i2c billing account need to configure cost exports.
https://infrastructure.2i2c.org/hub-deployment-guide/cloud-accounts/new-gcp-project/#create-a-new-gcp-project
34
There should be a final step linking to setting up quotas in https://infrastructure.2i2c.org/hub-deployment-guide/cloud-accounts/new-gcp-project
35
The docs for creating a new gcp project doesn't mention the ability to generate that and cluster config etc via
deployer generate-gcp-cluster
https://infrastructure.2i2c.org/hub-deployment-guide/cloud-accounts/new-gcp-project
36
When generating a new GCP cluster:
37
The gcp cluster variable prefix is used to generate resources, and if its more than 20 letters, the resource names generated, such as
<prefix>-cluster-sa
become longer than accepted. We could have validation about this to avoid it.Is it okay to make
catalystproject-latam
becomelatam
for example?I think so, because the following resources seemed to include it in its name
38
Wrong directory mentioned in leading comment at https://infrastructure.2i2c.org/hub-deployment-guide/new-cluster/new-cluster/#exporting-and-encrypting-the-cluster-access-credentials. Its really about making sure that the deployer gets credentials to the cluster in a file enc-deployer-credentials.secret.json put in config/clusters/cluster_name.
39
From https://infrastructure.2i2c.org/hub-deployment-guide/hubs/new-hub/
Should be a "Helm chart configuration"
40
Actually user servers are "containers" running in isolation from each others, but possibly on the same physical machine.
https://docs.2i2c.org/user/topics/data/filesystem/
41
Mention https://cloud.google.com/logging/docs/view/query-library is a good reference for queries in GCP
42
In GPU setup, mention a check for GPU availability in zones