eksctl-io / eksctl

The official CLI for Amazon EKS
https://eksctl.io
Other
4.83k stars 1.39k forks source link

[Bug] Integration tests failing for Infrentia and tranium #6772

Closed Himangini closed 11 months ago

Himangini commented 1 year ago

After merging https://github.com/weaveworks/eksctl/pull/6763 we encountered a few issues and failures in our integration tests. We need to fix these before release.

Tranium test
Summarizing 2 Failures:
[0]   [FAIL] (Integration) Trainium nodes cluster with trn nodes with --install-neuron-plugin=false when adding an unmanaged nodegroup by default [It] should install without error
[0]   /__w/eksctl-ci/eksctl-ci/eksctl/integration/tests/trainium/trainium_test.go:174
[0]   [FAIL] (Integration) Trainium nodes cluster with trn nodes with --install-neuron-plugin=false when adding an unmanaged nodegroup by default [It] should install the neuron device plugin
[0]   /__w/eksctl-ci/eksctl-ci/eksctl/integration/tests/trainium/trainium_test.go:178
Error:  exceeded max wait time for StackCreateComplete waiter

Inferentia test
 [FAIL] [BeforeSuite] 
[0]   /__w/eksctl-ci/eksctl-ci/eksctl/integration/tests/inferentia/inferentia_test.go:78
/__w/eksctl-ci/eksctl-ci/eksctl/integration/tests/inferentia/inferentia_test.go:54
[0] starting '../../../eksctl "--region" "us-west-2" "create" "cluster" "--verbose" "4" "--name" "it-inf-no-plugin-ferocious-hideout-1688696837" "--tags" "alpha.eksctl.io/description=eksctl integration test" "--install-neuron-plugin=false" "--nodegroup-name" "inf-ng-0" "--node-labels" "ng-name=inf-ng-0" "--nodes" "1" "--node-type" "inf2.xlarge" "--version" "1.25" "--zones" "us-west-2a,us-west-2c,us-west-2d" "--kubeconfig" "/tmp/it-inf-wonderful-sheepdog-1688696837.yaml"'
[0]   2023-07-07 02:27:17 [▶]  role ARN for the current session is "arn:aws:sts::***:assumed-role/GithubActionRole/integration-test-inferentia"
[0]   2023-07-07 02:27:17 [ℹ]  eksctl version 0.149.0-dev+591433a27.2023-07-07T02:20:04Z
[0]   2023-07-07 02:27:17 [ℹ]  using region us-west-2
[0]   2023-07-07 02:27:17 [▶]  determining instance availability in zones
[0]   Error: none of the provided AZs "" support instance type inf2.xlarge in nodegroup inf-ng-0
Himangini commented 11 months ago

Closing this issue since the PR in question was reverted.