Closed perrydevrekomodo closed 6 months ago
@lusoal Would you be able to check this one?
Sure let this up to me, will do tomorrow
./cleanup
removed .terraform and reinstalled.From the browser and the jupyterhub operator, I am getting similar errors about the nodes
2023-12-10T23:23:49.061887Z [Warning] 0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..
2023-12-10T23:23:50Z [Normal] Pod should schedule on: nodeclaim/gpu-ts-xln8j
2023-12-10T23:24:22Z [Normal] pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector
2023-12-10T23:29:10.795105Z [Warning] 0/4 nodes are available: 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..
2023-12-10T23:39:40Z [Normal] Pod should schedule on: nodeclaim/gpu-ts-cvkdn
Spawn failed: Timeout
The nodeclaim
seems to give a more helpful error:
`creating instance, with fleet error(s), InvalidParameter: Security group sg-063f3ce9c9129f915 and subnet subnet-03c6b3adbd7c25760 belong to different networks.; InvalidParameter: Security group sg-063f3ce9c9129f915 and subnet subnet-0355df2dfc137557b belong to different networks.
Description
Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration.
If your request is for a new feature, please use the
Feature request
template.⚠️ Note
Before you submit an issue, please perform the following for Terraform examples:
.terraform
directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!):rm -rf .terraform/
terraform init
Versions
Module version [Required]:
Terraform version:
% terraform -version Terraform v1.5.7 on darwin_amd64
Provider version(s):
% terraform providers -version Terraform v1.5.7 on darwin_amd64
provider registry.terraform.io/hashicorp/aws v5.29.0
provider registry.terraform.io/hashicorp/cloudinit v2.3.3
provider registry.terraform.io/hashicorp/helm v2.12.1
provider registry.terraform.io/hashicorp/kubernetes v2.24.0
provider registry.terraform.io/hashicorp/random v3.1.0
provider registry.terraform.io/hashicorp/time v0.9.2
provider registry.terraform.io/hashicorp/tls v4.0.5
Reproduction Code [Required]
Steps to reproduce the behavior: https://github.com/awslabs/data-on-eks/tree/main/ai-ml/jupyterhub
Not using workspacesYes, I have cleared the local cache
Port forwarded using the below command.
aws eks --region us-west-2 update-kubeconfig --name jupyterhub-on-eks kubectl port-forward svc/proxy-public 8080:80 -n jupyterhub http://localhost:8080/
Expected behavior
Upon sign-in, click Data Science option to trigger the Karpenter provisioner to launch a new g5.2xlarge instance, schedule a user-1 JupyterHub pod on it, and fetch the Docker image.
Actual behavior
I get the below errors