-
**Describe the bug**
I'm not sure if DeepSpeed needs to be adapted for Mixtral. When I tried using DeepSpeed inference for model inference, it didn't properly implement model parallelism. Instead, it…
-
### 🐛 Describe the bug
While training on a large cluster of AWS p4d instances with PyTorch, PyTorch Lightning, and DeepSpeed, we observe an IllegalMemoryAccess error that happens stochastically at …
-
**Describe the bug**
when fine-tuning my model using deepspeed==0.13.5, and huggingface trainer, loss and grad_norm will be nan at step 2
![image](https://github.com/microsoft/DeepSpeed/assets/29994…
-
How can i fix this error? I ran the command: torchrun --nproc_per_node=1 perf.py --msa-length 128 --res-length 256. Then the following error appeared.
The versions of Pytorch, Python, and CUDA are …
-
I get this error following the deepspeed-fastgen instructions:
```python
from mii import pipeline
pipe = pipeline("mistralai/Mistral-7B-v0.1")
```
The full stack trace is:
```
Loading ext…
-
Hi,
Thanks for the wonderful work. I am planning on doing some conformation sampling using this work, but unfortunately it seems like the hard requirement of CUDA 11.6 is an issue. I've tried diffe…
-
**Describe the bug**
Hello. I'm an active user of deepspeed for multi-node training.
I've always used zero3, but this time I tried attaching the hpz feature of zero++ for the first time. The issue…
-
Greetings!
Following the instructions, I've completed an installation and everything seemed to work including the generation of the MSA.
Specifically, I've done the recommended conda installation,…
-
**Environment:**
Ubuntu 22.04.4 LTS
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
ds_report added at the end of the description
**Issue:** Not able to…
-
I have code that run on g4dn.12xlarge with Nvidia T4 GPU (4 x 16 GB) ~ 64 GB.
## Issue
My code:
``` python
import mii
model = "meta-llama/Llama-2-13b-chat-hf"
mii_configs = {
"ten…