castai / helm-charts

CAST AI Kubernetes helm charts
Apache License 2.0
13 stars 20 forks source link

Cast AI setup failing #231

Closed ganesh-kumarmt closed 1 year ago

ganesh-kumarmt commented 1 year ago

HI Team,

We are seeing failure while setting up castai-agent as :


k logs -n castai-agent castai-agent-bc568646c-dpngq                                    
time="2023-02-08T13:39:13Z" level=info msg="running agent version: GitCommit=\"0ebd8d1fa65524cebda49b791a4e9e4a1fceb0b2\" GitRef=\"refs/tags/v0.42.1\" Version=\"v0.42.1\"" version=v0.42.1
time="2023-02-08T13:39:13Z" level=info msg="platform URL: https://api.cast.ai" version=v0.42.1
time="2023-02-08T13:39:13Z" level=info msg="starting healthz on port: 9876" version=v0.42.1
time="2023-02-08T13:39:13Z" level=error msg="agent stopped with an error: healthz server: http: Server closed" version=v0.42.1
time="2023-02-08T13:39:13Z" level=fatal msg="agent failed: getting provider: configuring aws client: getting instance region: EC2MetadataRequestError: failed to get EC2 instance identity document\ncaused by: EC2MetadataError: failed to make EC2Metadata request\nrequest blocked by allow-route-regexp \"^$\": /latest/dynamic/instance-identity/document\n\n\tstatus code: 404, request id: " version=v0.42.1

Best Regards Ganesh Kumar

ganesh-kumarmt commented 1 year ago

Quick Update : This was a EKS cluster with KIAM running, we encountered same issue as https://github.com/uswitch/kiam/issues/359

Tried on non Kiam EKS cluster able to connect. Please let me know if we have solution here or we need to upgrade Kiam as mentioned for other issue.

saumas commented 1 year ago

Hi @ganesh-kumarmt, sorry for late reply. The metadata endpoint access is optional. You can override all the properties that we discover through the metadata by setting the appropriate environment variables. We have a short doc describing your situation here: https://docs.cast.ai/docs/troubleshooting#your-cluster-does-not-appear-in-the-connect-cluster-screen

In short, add these environment variables to the castai-agent deployment:

- name: EKS_ACCOUNT_ID
  value: "000000000000"    # your aws account id
- name: EKS_REGION
  value: "eu-central-1"    # your eks cluster region
- name: EKS_CLUSTER_NAME
  value: "staging-example" # your eks cluster name

Also, the agent needs access to the AWS EC2 API which it uses to correctly identify node life cycle (spot or on-demand). If you have disabled EC2 auth by default for pods, then you will need to provide the agent with a way to authenticate to the AWS EC2 service. It uses the official AWS SDK so any of the AWS supported auth mechanisms should work: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html

AmazonEC2ReadOnlyAccess role should be enough.

FYI: AWS EC2 API access is optional but if you do not provide it, then we have no surefire way to determine the life cycle of the node and if you're using spots, then we will treat them as on-demand.

ganesh-kumarmt commented 1 year ago

Thanks @saumas we are good with Setup Now