evoformer Search Results

239 results
for evoformer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeed #4511

[BUG] StarCoder inference not working with AutoTP

**Describe the bug** StartCoder inference with AutoTP doesn't work. I get the following error: ``` File "[...]/venv38/lib64/python3.8/site-packages/transformers/models/gpt_bigcode/modeling_gpt_b…

Epliz updated 8 months ago
2
microsoft/DeepSpeed #5677

[BUG] Using and Building DeepSpeedCPUAdam

**Describe the bug** I installed deepspeed with pip install deepspeed and tried to use DeepSpeedCPUAdam but with this error ``` Exception ignored in: Traceback (most recent call last): File …

oabuhamdan updated 1 month ago
31
microsoft/DeepSpeed #4759

[BUG] [ERROR] [autotuner.py:699:model_info_profile_run] The …

auto.json: ``` { "train_micro_batch_size_per_gpu": "auto", "fp16": { "enabled": true }, "autotuning": { "enabled": true, "fast": false, "overwrite": t…

yongjer updated 4 months ago
4
microsoft/DeepSpeed #6522

[BUG] error ：past_key, past_value = layer_past，how to solve …

**Describe the bug** when i run train，rlhf step 3； ``` Actor_Lr=9.65e-6 Critic_Lr=5e-6 #--data_path Dahoas/rm-static \ #--offload_reference_model \ deepspeed --master_port 12346 main_step3.py…

lovychen updated 1 day ago
1
microsoft/DeepSpeed #4802

[BUG] Zero2/3 segmentation fault with CPU optimizer off-load…

**Describe the bug** Deepspeed got segfault when loading CPU_ADAM, both with zero-2 and zero-3 configs / Huggingface transformers integration. **Zero Configuations** - Zero-2 ``` { "fp16":…

haixpham updated 1 month ago
5
microsoft/DeepSpeed #6470

[BUG] Universal checkpoint incompatibility with HF Trainer

**Describe the bug** I'm currently using the HF Trainer for training, with the HF learning rate scheduler and DeepSpeed optimizer. I've encountered an issue with loading universal checkpoints. The HF…

huyiwen updated 5 days ago
7
microsoft/DeepSpeed #5692

[BUG] Regression: 0.14.3 causes grad_norm to be zero

**Describe the bug** When I upgrade to DeepSpeed 0.14.3, training does not progress because all gradients and gradient norms are zero. From using git bisect, I think it's from this PR: https://git…

rosario-purple updated 2 months ago
2
microsoft/DeepSpeed #5844

[BUG] Training time regression with ZeRO-3 after upgrade to …

**Describe the bug** For ZeRO-3, i'm noticing an increase in training times on g5.48xlarge nodes with torch >= 2.3.1 and CUDA 12.1. I can reproduce this with small and large models, and in some cases…

SumanthRH updated 2 days ago
4
microsoft/DeepSpeed #6461

ds_report issue with GDS - undefined references to dlvsym, d…

After installing deepspeed 0.15.0 via pip3, I ran ds_report to get compatibility of various features. I get the following messages when looking for GDS compatibility: ``` [2024-08-29 15:16:37,…

sujitkumar12 updated 1 week ago
4
microsoft/DeepSpeed #5347

[BUG] Grad_norm is nan and Loss is 0

**Describe the bug** when train [llama-vid](https://github.com/dvlab-research/LLaMA-VID) (stage2, full-finetuning LLaMA) using deepspeed==0.14.0, and transformers trainer, grad_norm will be nan (or 1…

xxtars updated 2 days ago
5

上一页 1...4 5 6 7 8 9 10...24 下一页

239 results for evoformer

239 results
for evoformer