-
### What happened + What you expected to happen
1. This template works on NVIDIA A10 GPUs on AWS (g5.xlarge instances): https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/aws/exampl…
-
## Update
First pass of an MMI implementation was merged in #340, improved in #402 and individual commits
Redesign of the data structure was laid out in #524 and implemented in #523.
This ti…
-
A recent deployment of Nebari 2024.03.03 on an AWS with a `g4dx.xlarge` GPU profile has led to an issue where, despite the CUDA-related packages appearing correctly configured, `torch.cuda.is_availabl…
-
Currently we use `DCGM_FI_DEV_GPU_TEMP` to obtain the instance/GPU list, but this metrics is not collected in vGPU clusters. This will prevent the dashboard from displaying properly.
https://github…
-
### 🐛 Describe the bug
## Description
When using PyTorch 2.x with multiprocessing (`torch.multiprocessing.Pool`), there is a significant performance degradation (100x) compared to PyTorch 1.13.1 whe…
-
Hi, thank you for you work. I have a issue about multi GPUs. I have four GPUs and I add four screen. I run some andriod simulators on the screen. I found total four screen load on one single GPU. Is t…
-
### Checklist [README]
- [X] Device is not undervolted nor overclocked
- [X] Device is using the [latest drivers](https://github.com/IGCIT/Intel-GPU-Community-Issue-Tracker-IGCIT/wiki/Intel-driver…
-
### Your current environment
```text
Collecting environment information...
WARNING 08-05 21:59:09 _custom_ops.py:15] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm.…
-
I follow guide: [arena/docs/userguide/9-top-job-gpu-metric.md](https://github.com/kubeflow/arena/blob/master/docs/userguide/9-top-job-gpu-metric.md).
everything works as expect until last one, when…
-
### What is the issue?
Linux, I use the following command to start Ollama server:
CUDA_VISIBLE_DEVICES=1,2,3,4,5 OLLAMA_MAX_LOADED_MODELS=5 ./ollama-linux-amd64 serve&
Then I want to run several py…