-
**Describe the bug**
I installed deepspeed with pip install deepspeed and tried to use DeepSpeedCPUAdam but with this error
```
Exception ignored in:
Traceback (most recent call last):
File …
-
ran into a few issues trying to run https://github.com/microsoft/DeepSpeed-MII/tree/main/examples/benchmark/txt2img
1. need to set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python
2. ImportError: can…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…
-
**Describe the bug**
I have two ubuntu machines, and with 10Gb/s erthnet cable connected and I want to use deepspeed to use these two machines to
run a model training with pipeline parallel, and …
-
I just use pretrain_gpt.py,but receive such problem
this is my script and library version:
script:
#! /bin/bash
set -e
# Change for multinode config
logname=$(date +'%Y-%m-%d_%H:%M:%S')
if [ -n…
-
when I use
```
import mii
client = mii.serve("/metaai/Llama-2-13b-chat-hf")
response = client.generate(["Deepspeed is", "Seattle is"], max_new_tokens=128)
print(response)
```
to …
-
### 🐛 Describe the bug
Hi,
We use `torch.compile` to run GPTJ3.6B model training on our GPU platforms, while we got some dynamo errors and the process aborted. The error is happening when runnin…
-
I was trying to run the code with the following command
`bash scripts/ds_zero2_pretrain_gpt2_model_parallel.sh`
and i got an error like below.
```
deepspeed --num_nodes 1 --num_gpus 4 pretrai…
-
### Describe the bug
By default, simply adding "report_to": "wandb" as an argument for training_args (for HF Trainer) only creates plots (say, for GPU usage) for only the master node on the wan…
-
my training environment is a docker image pulled from `deepspeed/deepspeed:v072_torch112_cu117`
and i run it with `docker run -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --…