deepspeed-library Search Results

1000+ results
for deepspeed-library

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeed-MII #452

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_li…

**Environment:** Ubuntu 22.04.4 LTS Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0 ds_report added at the end of the description **Issue:** Not able to…

Andronixs updated 6 months ago
6
ncoop57/i-am-a-nerd #13

DeepSpeed Investigation: What I Learned | IAmANerd

# DeepSpeed Investigation: What I Learned | IAmANerd An investigation into the awesome DeepSpeed library for training large models on a single GPU! [https://nathancooper.io/i-am-a-nerd/deepspeed/dee…

utterances-bot updated 2 years ago
5
microsoft/DeepSpeed #3207

[BUG]error: can't copy 'deepspeed/accelerator': doesn't exis…

**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: [the official doc](https://github.com/microsoft/DeepSpeed/blob/master/b…

ucas010 updated 1 week ago
9
pytorch/pytorch #139742

Hooks on param AccumulateGrad are not called when the param …

### 🐛 Describe the bug Background of the issue: DeepSpeed depends a lot on param.data = other.data for ZeRO3 parameter offload. And ZeRO3 also depends on register a hook on param AccumulateGrad ob…

jerrychenhf updated 12 hours ago
3
PKU-YuanGroup/Open-Sora-Plan #504

AMD card support?

As an owner of a Radeon 7900 XTX, I'm wondering if this project could be made to support AMD cards too. The problem is the `xformers` dependency which does not support AMD cards. Does Open-Sora-Plan u…

agronholm updated 1 week ago
5
THUDM/ChatGLM-6B #1154

[BUG/Help] 使用全精度多卡训练报错torch_extensions/py39_cu116/utils/uti…

### Is there an existing issue for this? - [X] I have searched the existing issues ### Current Behavior 使用全精度多卡训练时，编译torch extentions报错： In file included from /home/adamzhangchao/anaconda3/e…

peterzhang2029 updated 1 year ago
10
microsoft/DeepSpeedExamples #147

bing_bert script error

Error occurred running bing_bert/ds_train_bert_nvidia_data_bsz64k_seq128.sh >Detected CUDA files, patching ldflags Emitting ninja build file /home/bduser/.cache/torch_extensions/py38_cu114/fused_l…

jeyblu updated 2 years ago
4
microsoft/DeepSpeed #5653

[BUG] oneapi/ccl.hpp: No such file or directory.

**Describe the bug** The builds on conda-forge have been failing since `deepspeed=0.14.1` for CUDA 11.8 and 12.0 with an error like `fatal error: oneapi/ccl.hpp: No such file or directory`. Origina…

weiji14 updated 2 days ago
12
huggingface/transformers #26706

Add an option to decide whether to store the checkpoint and …

**Motivation:** Currently, when using the Transformers library in combination with DeepSpeed for training large language models like LLMs, checkpoints (e.g. `bf16_zero_pp_rank_0_mp_rank_00_optim_stat…

timturing updated 11 months ago
7
philschmid/deep-learning-pytorch-huggingface #49

Out of Memory: Cannot reproduce T5-XXL run on 8xA10G.

I am trying to reproduce the FLAN-T5-XXL (11B) results from [this blog post](https://www.philschmid.de/fine-tune-flan-t5-deepspeed). I have an 8xA10G instance. Since the blog shows that you can run…

slai-natanijel updated 7 months ago
3

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for deepspeed-library

1000+ results
for deepspeed-library