Open swirkert1 opened 1 month ago
I think a workaround is to let them run sequentially in different groups as the bug seems to be connected to running them in parallel.
Unfortunately no. While it says "SUCCEEDED" in the state, the pvc of the previous integration module was deleted. Also, when trying this out with a third fsx volume and integration module it failed again. All seems kind of random
I think for the bug with the set-permissions-job we need to give it a unique name here: "metadata": {"name": "set-permissions-job", "namespace": eks_namespace}, the rest: dont know
I gave a unique name to the permission jobs and removed the pv and pvc from depending on the namespace. Now it works after the second make deploy.
Describe the bug create an eks cluster and two fsx volums.
Now use path: git::https://github.com/awslabs/idf-modules.git//modules/integration/fsx-lustre-on-eks?ref=release/1.11.0&depth=1
two times to connect the fsx voulums to the cluster. This fails, one time with the EksHandlerRoleArn (which is not documented but needed) already existing and the second time with the set_permissions_job already existingj
To Reproduce
Expected behavior ressources are created
Screenshots addf-llpdrsw-integration-lustre-on-eks-1a | 10/13 | 2:06:49 PM | CREATE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | addf-llpdrsw-integration-lustr-eks-cluster/manifest-SetPermissionsJob/Resource/Default (addfllpdrswintegrationlustreksclustermanifestSetPermissionsJob14F57FB1) Received response status [FAILED] from custom resource. Message returned: Error: b'Error from server (AlreadyExists): error when creating "/tmp/manifest.yaml": jobs.batch "set-permissions-job" already exists\n'