Closed cPu1 closed 1 year ago
Inferentia and Trainium integration tests are failing with this error:
[0] (Integration) Inferentia nodes cluster with inf nodes with --install-neuron-plugin=false when adding an unmanaged nodegroup by default [It] should install the neuron device plugin [0] /__w/eksctl-ci/eksctl-ci/eksctl/integration/tests/inferentia/inferentia_test.go:165 [0] [0] [FAILED] Unexpected error: [0] <*errors.StatusError | 0xc000351f40>: [0] daemonsets.apps "neuron-device-plugin-daemonset" not found [0] { [0] ErrStatus: { [0] TypeMeta: {Kind: "", APIVersion: ""}, [0] ListMeta: { [0] SelfLink: "", [0] ResourceVersion: "", [0] Continue: "", [0] RemainingItemCount: nil, [0] }, [0] Status: "Failure", [0] Message: "daemonsets.apps \"neuron-device-plugin-daemonset\" not found", [0] Reason: "NotFound", [0] Details: { [0] Name: "neuron-device-plugin-daemonset", [0] Group: "apps", [0] Kind: "daemonsets", [0] UID: "", [0] Causes: nil, [0] RetryAfterSeconds: 0, [0] }, [0] Code: 404, [0] }, [0] } [0] occurred
Tranium
Trainium nodes cluster with trn nodes with --install-neuron-plugin=false when adding an unmanaged nodegroup by default [It] should install the neuron device plugin [0] /__w/eksctl-ci/eksctl-ci/eksctl/integration/tests/trainium/trainium_test.go:181 [0] [0] [FAILED] Unexpected error: [0] <*errors.StatusError | 0xc0000e7e00>: [0] daemonsets.apps "neuron-device-plugin-daemonset" not found [0] { [0] ErrStatus: { [0] TypeMeta: {Kind: "", APIVersion: ""}, [0] ListMeta: { [0] SelfLink: "", [0] ResourceVersion: "", [0] Continue: "", [0] RemainingItemCount: nil, [0] }, [0] Status: "Failure", [0] Message: "daemonsets.apps \"neuron-device-plugin-daemonset\" not found", [0] Reason: "NotFound", [0] Details: { [0] Name: "neuron-device-plugin-daemonset", [0] Group: "apps", [0] Kind: "daemonsets", [0] UID: "", [0] Causes: nil, [0] RetryAfterSeconds: 0, [0] }, [0] Code: 404, [0] }, [0] } [0] occurred [0] In [It] at: /__w/eksctl-ci/eksctl-ci/eksctl/integration/tests/trainium/trainium_test.go:183
Closing this since the related PR got reverted and these failures are no longer happening.
Inferentia and Trainium integration tests are failing with this error:
Tranium