-
OS: Windows10
Two GPUs,3090 and 4090
Compared to Stable Diffusion with a GPU(4090) and "--precision full --no-half --xformers" , it's the same speed!
-
So when i launch the latest ollama 0.2.8 it uses one gpu but when i use ollama version 0.1.30 it uses all the gpu. The fix applied 0.1.30 didnt make it to 0.2.8
Here are the logs:
[log_ollama.txt](h…
-
We tried to launch the model worker on a machine with multiple RTX3090 GPUs, but we couldn't use this command `python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --p…
-
GPU0 GPU1 CPU Affinity NUMA Affinity
GPU0 X NV4 20-39,60-79 1
GPU1 NV4 X 20-39,60-79 1
![image](https://github.com/DeltaGroupNJUPT/Vina-GPU-2…
-
Hi,
It seems the model runs on only one GPU by default. Is it possible to run it on multiple GPUs. My machine has 4 GPUs. But the memory of each is only about 10 G, which is not enough to run the m…
-
Error:
AttributeError: Can't pickle local object 'add_hook_to_module..new_forward'
[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sh…
-
### What happened?
I am using Llama.cpp + SYCL to perform inference on a multiple GPU server. However, I get a Segmentation Fault when using multiple GPUs. The same model can produce inference output…
-
As described here: https://huggingface.co/docs/diffusers/en/training/distributed_inference#pytorch-distributed
Has it been tested? I'm wondering what the best way to do this is. Any suggestions / p…
-
I have 2 2070 supers and would love to be able to use them in parallel. Would be possible to enable memory pooling. I know it is in theory supported by pytorch. Any chance it can be added here so that…
-
### Describe the issue
Issue: When I follow the instructions to install this repos, it can run inference fine when using a single GPU. But when using multiple gpus, it gave errors.
Command:
```
…