awslabs / data-on-eks

DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS
https://awslabs.github.io/data-on-eks/
Apache License 2.0
620 stars 209 forks source link

Unable to deploy llama2 on Eks/Ray Serve/inf2 #429

Closed harishvs closed 1 month ago

harishvs commented 7 months ago

Description

Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration.

If your request is for a new feature, please use the Feature request template.

⚠️ Note

Before you submit an issue, please perform the following for Terraform examples:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

Reproduction Code [Required]

Steps to reproduce the behavior: I try to follow the instructions in this tutorial https://awslabs.github.io/data-on-eks/docs/gen-ai/inference/Llama2

No

Expected behavior

The pods in llama2 workspace startup properly and ray serve deployment succeeds.

Actual behavior

The pods in llama2 workspace are stuck in pending. The ray serve deployment is stuck in deploying state for ever.

Terminal Output Screenshot(s)

Additional context

harishvs commented 7 months ago

Please assign this bug to me. I have put up a PR for this

vara-bonthu commented 7 months ago

@harishvs Have you tried the latest code? Please close this issue if its not reproducible. Thanks

github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] commented 1 month ago

Issue closed due to inactivity.