-
### What happened + What you expected to happen
We're running into a weird issue where a single replica is stuck in the STARTING state for a while, even though resources are available.
```
Deplo…
-
### Description
Ray currently relies on static configurations for task scheduling, limiting efficiency during dynamically changing workloads. Adding adaptive scaling would allow clusters to automatic…
-
Many times I see users run into common error scenarios that end up hanging the entire workload.
1. Scheduling something on the head node when head node `num_cpus=0`
2. Scheduling a task that requests…
-
### System Info
- Cpu arch x86_64
- Cpu memory 2 TB
- GPU 8xH100
- Libraries TensorRT-LLM v0.12.0
- Cuda 12.5
- Driver version 555.42.06
- OS Ubuntu
### Who can help?
@kaiyux
### Information
- […
-
Thanks for such a great work and awesome library.
I am using spark-rapids with EMR-7.3 for the deep learning model inference with predict_batch_udf.
I have been following the provided documentation f…
-
### What happened + What you expected to happen
There is a single actor that is pending node assignment that is causing the script to hang forever, even though the resources are available, and the …
-
Source File: [/docs/tasks/manage-gpus/scheduling-gpus.md](https://github.com/kubernetes/website/blob/master/content/en/docs/tasks/manage-gpus/scheduling-gpus.md)
Diff 命令参考:
```bash
# 查看原始文档与翻译文档更新差异…
-
Source File: [/docs/tasks/manage-gpus/scheduling-gpus.md](https://github.com/kubernetes/website/blob/release-1.16/content/en/docs/tasks/manage-gpus/scheduling-gpus.md)
Diff 查看原始文档更新差异命令:
```bash
git…
-
Secondary isolates currently do not have access to any of the window bindings that back `dart:ui`. Clients that attempt to use parts of the Flutter API that depend on the same run into exceptions beca…
-
**How to categorize this issue?**
/area auto-scaling
/kind enhancement
/platform gcp
**What would you like to be added**:
Permit specification of custom extended resources in the `worker.providerCo…