-
Relevant links:
- inference docs: https://deepspeed.readthedocs.io/en/latest/inference-init.html
- Getting started tutorial: https://www.deepspeed.ai/tutorials/inference-tutorial/
- init_distribute…
-
# Prerequisites
I have searched and tried for a week now.
# Expected Behavior
I am installing five llms on a A100 40GB GPU, each running a model of 6GB (which is the llama-3 8B instruct mode…
-
### What happened?
I am using Llama.cpp + SYCL to perform inference with Qwen2 MoE. The prediction output seems normal, but the following lines in the debug log indicates that the model is not offloa…
-
### Issue Type
Others
### OS
Linux
### onnx2tf version number
1.19.11
### onnx version number
1.15.0
### onnxruntime version number
1.16.3
### onnxsim (onnx_simplifier) version number
0.4.3…
-
Hi I just read about DOC and have some questions. I am developing in python since a few years but never really worked with AI yet. But this project is fascinating me because it may be the help/assista…
-
### What happened?
Unable to get ollama to use the GPU for processing, I am following the guide [provided](https://github.com/otwld/ollama-helm),
Pod log attached below, not detecting GPU but runni…
-
**Steps to reproduce:**
1. Run a Docker container using `ollama/ollama:rocm` on a machine with a single MI300X
2. Inside the container, run `ollama run llama3.1:70B`
**Actual behaviour:**
```
…
-
_placeholder for brainstorm_. Finished all master courses. (part-time side job)
Exploring for 1 month what a good master thesis direction is around LLM.
Draft master thesis (again placeholder): *…
-
**Describe the bug**
Unable to use/test fp6 quantization in deepspeed 0.14 in inference mode on a GPT2 model. There is little documentation on usage right so not sure if I have the wrong init metho…
-
### What is the issue?
My machine has two CPUs without GPUs, and when I run the model, I find that the CPUs are used at most 50%
![PixPin_2024-07-31_16-11-31](https://github.com/user-attachments/ass…