Open swirkert1 opened 1 week ago
Thanks for this issue. We will investigate (preliminary inspection indicates that the lustre-on-eks module doesn't support IP whitelisting, but we will look into it.
Thanks Derek,
if that is the case what would be your recommendation? Replicate the behaviour of the lustre-on-eks by just issuing the necessary cluster manipulations "manually" using kubectl?
I think we need to investigate first before we solution.
RCA: When using ips_to_whitelist_adhoc
, a PUBLIC_AND_PRIVATE
cluster API endpoint is created with public access only being limited to whitelisted IPs and private to traffic only from within the VPC. FSX for Lustre on EKS integration module deploys manifests via custom resources, which requires custom resource lambdas to be provisioned in private VPC subnets to be able to access the cluster API endpoint.
@swirkert1, a fix is merged and planned for the upcoming release. As the release is cut, please make sure to update your integration/fsx-for-lustre-on-eks
manifests to the latest version and pass VpcId
and PrivateSubnetIds
of your EKS cluster.
- name: VpcId
valueFrom:
moduleMetadata:
group: base
name: networking
key: VpcId
- name: PrivateSubnetIds
valueFrom:
moduleMetadata:
group: base
name: networking
key: PrivateSubnetIds
Describe the bug
When using eks with ip whitelists, using the lustre integration module fails.
To Reproduce
path: git::https://github.com/awslabs/idf-modules.git//modules/compute/eks?ref=release/1.11.0&depth=1
with ip whitelists
path: git::https://github.com/awslabs/idf-modules.git//modules/storage/fsx-lustre?ref=release/1.11.0&depth=1
path: git::https://github.com/awslabs/idf-modules.git//modules/integration/fsx-lustre-on-eks?ref=release/2.11.0&depth=1
Expected behavior EKS and FSx created, connected via lustre-on-eks.
Error
I guess it cannot download the manifest becasue it has no connection to the outside anymore?:
addf-llpdrsw-integration-lustre-on-eks | 7/13 | 11:09:16 AM | CREATE_IN_PROGRESS | Custom::AWSCDK-EKS-KubernetesResource | namespace/Resource/Default (namespace177341A3) Resource creation Initiated
580 | addf-llpdrsw-integration-lustre-on-eks | 7/13 | 11:09:17 AM | CREATE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | namespace/Resource/Default (namespace177341A3) Received response status [FAILED] from custom resource. Message returned: Error: Operation failed after 3 attempts: b'error: error validating "/tmp/manifest.yaml": error validating data: failed to download openapi: Get "https://2F84111B4C79BC02AE237DC7CE5AC928.yl4.eu-central-1.eks.amazonaws.com/openapi/v2?timeout=32s": dial tcp 3.64.123.127:443: i/o timeout; if you choose to ignore these errors, turn validation off with --validate=false\n'