-
The current way multi-slicing training is not robust or reliable on spot instances.
There have been some discussion inside CRFM and with GCP team on this topic. I create this issue to capture the …
-
### Search before asking
- [X] I searched the [issues](https://github.com/ray-project/kuberay/issues) and found no similar issues.
### KubeRay Component
ray-operator
### What happened + What you …
-
### Motivation: Why do you think this is important?
We will start a ray cluster and ray dashboard when running ray tasks. Now, we are able to open dashboard UI through NodePort. However, some ingress…
-
-
There is an alternative to xdebug called [Ray](https://myray.app/docs/getting-started/introduction) - as I'm not a researcher I'm not sure if this solution will also be good enough. It's for sure grea…
-
**Describe the bug**
I have Lambda A that calls Lambda B via http through API Gateway. The span is not joined and instead I end up with two separate traces when using any tool other than X-RAY
**S…
-
### What happened + What you expected to happen
Accessing the ray dashboard of the AWS cluster does not work. I am getting an empty page similarly to https://github.com/ray-project/ray/issues/39564…
-
We need to replace Ray on EKS Website doc with JARK and remove the Ray on EKS blueprint.
-
### What happened + What you expected to happen
I have found that Ray autoscaler sometimes mistakenly kills some nodes that are working. My scenario is that 400 Ray Tasks are submitted at the same ti…
-
Ray version : ray 2.10
llm-on-ray : latest from main branch
command used to run : llm_on_ray-finetune --config_file llm-on-ray/llm_on_ray/finetune/finetune.yaml
**RuntimeError: oneCCL: atl_ofi_…