aws-samples / service-catalog-engine-for-terraform-os

Apache License 2.0
134 stars 41 forks source link

Using TRE EXTERNAL Product type to deploy EKS and Kubernetes resources #90

Open sitadconsulting opened 3 months ago

sitadconsulting commented 3 months ago

Thanks and Appreciation

First, we want to thank the Service Team for an excellent work in developing this capability for self-service infrastructure provisioning for the TRE EXTERNAL product type.

The gains we have made

With the existing capability of the TRE engine, we have been able to successfully deploy an EKS cluster and subsequently layered on Kubernetes resources. The cluster and Kubernetes resources are deployed as distinct product. We have received help along the way from both Service Catalog and EKS Engineers. Thank you !

The issue we are facing

The Service Catalog End User is unable to terminate the Kubernetes resources product.

The error resulting from the issue

Error: Get "http://localhost/api/v1/namespaces/kubecost ": dial tcp 127.0.0.1:80: connect: connection refused Error: Get "http://localhost/api/v1/namespaces/externaldns ": dial tcp 127.0.0.1:80: connect: connection refused Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable Error: Get "http://localhost/api/v1/namespaces/aws-pca-issuer ": dial tcp 127.0.0.1:80: connect: connection refused Error: Get "http://localhost/api/v1/namespaces/cert-manager ": dial tcp 127.0.0.1:80: connect: connection refused Error: Get "http://localhost/api/v1/namespaces/kube-system/serviceaccounts/aws-load-balancer-controller ": dial tcp 127.0.0.1:80: connect: connection refused Error: Kubernetes cluster unreachable: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable

Our understanding of this error

For any tool to successfully interact with an EKS cluster, it needs to provide the following: Cluster CA certificate, Cluster Endpoint Url and Cluster name, otherwise, the default is to assume the connection request is to a locally deployed cluster. We therefore suspect that the Terminate Product Workflow may not have these parameters set, hence it is assuming a local cluster connectivity and fails with the errors show above.

_The terraform_runner perform_apply and performdestroy functions

We have observed that the perform_apply function of the terraform_runner script is passed the variables (which includes,;The Cluster CA Certificate, Cluster Endpoint URL and Cluster name amongst others) supplied when deploying the Kubernetes resources product as artifact_arguments, as a result the deployment of the product succeed because it fulfils the required condition to interact with the cluster. On the other hand, the perform_destroy function has no artifact_arguments passed to it that allows for cluster interaction, hence it is failing as noted in the errors reported above.

Our plea

For you to unleash your creativity abilities to incorporate the changes required to help address this issue.

The Potential Impact of Addressing this issue

  1. Overall increase in adoption of the Service Catalog approach to simplified self-service infrastructure provisioning, due to customer engagement success.
  2. Single and convenient tool to provision infrastructure, resulting in increase productivity for Developers and Engineers
  3. A win for the Service Team, others and us.