deepspeed-library Search Results

1000+ results
for deepspeed-library

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeed #5677

[BUG] Using and Building DeepSpeedCPUAdam

**Describe the bug** I installed deepspeed with pip install deepspeed and tried to use DeepSpeedCPUAdam but with this error ``` Exception ignored in: Traceback (most recent call last): File …

oabuhamdan updated 3 months ago
31
microsoft/DeepSpeed-MII #159

stable diffusion fails looking for `SAFE_WEIGHTS_NAME`

ran into a few issues trying to run https://github.com/microsoft/DeepSpeed-MII/tree/main/examples/benchmark/txt2img 1. need to set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python 2. ImportError: can…

lizelive updated 1 year ago
1
axolotl-ai-cloud/axolotl #1706

Zero loss and nan grad_norm when Flash Attention is enabled

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…

fgdfgfthgr-fox updated 1 month ago
2
microsoft/DeepSpeed #5313

[BUG] Failed for using cpu for pipeline based training acro…

**Describe the bug** I have two ubuntu machines, and with 10Gb/s erthnet cable connected and I want to use deepspeed to use these two machines to run a model training with pipeline parallel, and …

xuanhua updated 4 months ago
10
microsoft/Megatron-DeepSpeed #124

RuntimeError: The global rank 0 is not part of the group <to…

I just use pretrain_gpt.py,but receive such problem this is my script and library version: script: #! /bin/bash set -e # Change for multinode config logname=$(date +'%Y-%m-%d_%H:%M:%S') if [ -n…

Thewillman updated 1 year ago
2
microsoft/DeepSpeed-MII #327

fail to run llama-2-7B and llama-2-13B

when I use ``` import mii client = mii.serve("/metaai/Llama-2-13b-chat-hf") response = client.generate(["Deepspeed is", "Seattle is"], max_new_tokens=128) print(response) ``` to …

xzzWZY updated 6 months ago
2
pytorch/pytorch #116766

[Dynamo][DeepSpeed] torch._dynamo.exc.InternalTorchDynamoErr…

### 🐛 Describe the bug Hi, We use `torch.compile` to run GPTJ3.6B model training on our GPU platforms, while we got some dynamo errors and the process aborted. The error is happening when runnin…

zejun-chen updated 3 months ago
11
microsoft/DeepSpeedExamples #85

ImportError: No module named 'fused_adam'

I was trying to run the code with the following command `bash scripts/ds_zero2_pretrain_gpt2_model_parallel.sh` and i got an error like below. ``` deepspeed --num_nodes 1 --num_gpus 4 pretrai…

zerojooon updated 1 year ago
3
wandb/wandb #7324

[CLI]: wandb doesn't report logs from all nodes for distribu…

### Describe the bug By default, simply adding "report_to": "wandb" as an argument for training_args (for HF Trainer) only creates plots (say, for GPU usage) for only the master node on the wan…

tnnandi updated 1 week ago
2
microsoft/DeepSpeedExamples #525

[bug]AttributeError: 'DeepSpeedHybridEngine' object has no a…

my training environment is a docker image pulled from `deepspeed/deepspeed:v072_torch112_cu117` and i run it with `docker run -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --…

qingchu123 updated 7 months ago
4

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for deepspeed-library

1000+ results
for deepspeed-library