-
**Describe the bug**
I encountered an issue when using DeepSpeed 0.12.4 with the [OpenChat trainer](https://github.com/imoneoi/openchat), where checkpointing failed and raised an NCCL error. However,…
-
I'm confused about dap,
1. Can the parameter dap_size only take 2? means row and column?
2. Is the input data complete or do I need to divide the data by dap_size as input?
thanks
-
**Describe the bug**
![image](https://github.com/microsoft/DeepSpeed/assets/138777240/5bd06bbc-0c50-42f1-a04f-9367a5ab801a)
I was trying to run the above given script and I run into this error:
…
-
**Describe the bug**
When I load CLIP via CLIPVisionModel.from_pretrained("openai/clip-vit-large-patch14-336") within deepspeed, none of the model weights are loaded (i.e. they are a tensor of size z…
-
Hello, there! I am trying to implement the multimer module of OpenFold by using pre-computed MSAs. As a proof of concept, I am using the protein provided as an example in the tutorial, as well as its …
-
I have detailed this on a closed ticket here https://github.com/microsoft/DeepSpeed/issues/3342#issuecomment-1826447914 why the current instructions are unclear (along with photos showing what the Dee…
-
**Describe the bug**
When comparing zero-1 and zero-2, I noticed discrepancies between the results in the DeepSpeed Flops Profiler and the training speed metrics in transformers, and the conclusions …
-
**Describe the bug**
Circular import error with PyTorch nightly. If I uninstall deepspeed it works fine.
```
Traceback (most recent call last):
File "/test/oss.py", line 322, in
mp.spawn…
-
I am using finetune_lora.sh with zero3_offload.json to train (context below) and get the following error.
```
Traceback (most recent call last):
File "/deep/u/emily712/GeoChat/geochat/train/tr…
-
**Describe the bug**
mistral doesn't fully convert to deepspeed format despite support in v2 module
**To Reproduce**
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
impor…