-
### Configurations
- cuSPARSELt version: 0.6.2
- Hardware: A10 with 2 cards
- cuda version: 12.1
- Driver: 550.90.07
### Problem
Our team is integrating cuSPARSELt into a custom Inference Engine to i…
-
🚀 The feature, motivation and pitch
# RFC: Multi-Gpu Python Frontend API
This RFC compares and contrasts some ideas for exposing multi-gpu support in the python frontend.
1. The current `multigpu_sc…
-
**Describe the support request**
Hi there, this is a bit of a follow up on my previous issue (https://github.com/intel/intel-device-plugins-for-kubernetes/issues/1769).
What is the behavior of…
-
With the recent advent of large models (take Llama 3.1 405b, for example!), distributed inference support is a must! We currently support naive device mapping, which works by allowing a combination of…
-
### System Info
python version: 3.11.9
transformers version: 4.44.2
accelerate version: 0.33.0
torch version: 2.4.0+cu121
### Who can help?
@gante
### Information
- [X] The official example sc…
-
Hi all, and thanks for this wondful work.
Through i go smoothly by the example in readme, i was trapped when reproducing the work on [Qwen2-7b-instruct](https://huggingface.co/Qwen/Qwen2-7B-Instru…
-
### Jan version
0.5.3
### Describe the Bug
I imported many models and for some of them, they are failing to load if I selected my both graphic cards (RTX 3060 12Go).
If I unselect one of them, the…
-
**Description**
When there are multiple GPU, only one GPU is used.
**Triton Information**
Container: nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3
**To Reproduce**
Follow the instrcutio…
-
Currently, geo-inference only supports the use of a single GPU. I want to support the use of multiple GPUs to increase inference speed.
-
### Describe the issue:
I am encountering what appears to be a GPU memory management issue when using the multi-shank configuration in Kilosort 4.0.16. Specifically, when processing data from a Neuro…