fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

zhangfaen/finetune-Qwen2-VL #8

多卡训练VL-7B时报错；显存是足够的，这个训练方式是每张卡一个模型吗？

[rank1]: Traceback (most recent call last): [rank1]: File "/storage/garlin/deep_learning/finetune-Qwen2-VL/finetune_distributed.py", line 200, in [rank1]: train() [rank1]: File "/storage/g…

weilanzhikong updated 1 day ago
1
facebookresearch/fairseq #3464

[FSDP] Size mismatch when finetuning mBART with `translation…

## 🐛 Bug Got errors when loading mBART.cc25 pretrained model for fine-tuning on `translation_multi_simple_epoch` in FSDP. ### To Reproduce Steps to reproduce the behavior (**always include th…

thpun updated 2 years ago
3
pytorch/pytorch #133849

CatArrayBatchedCopy and AllGather don't overlap during FSDP …

### 🐛 Describe the bug I used Hugging face training code. I found during backward of training by using FSDP, the AllGather kernel doesn't overlap CatArrayBatchedCopy kernel. I don't know why. s…

JuiceLemonLemon updated 1 month ago
2
pytorch/pytorch #120722

Allow Full State Dict with 2D FSDP + TP

### 🐛 Describe the bug Torch does not allow 2D FSDP + TP to get FULL_STATE_DICT. However, if I remove checks here: https://github.com/pytorch/pytorch/blob/3f62b05d31d4b29d60874b05adc0e5aedbad3722/to…

mvpatel2000 updated 1 month ago
16
foundation-model-stack/fms-fsdp #93

The default model variant is 7b but it is not supported.

the default model variant is "7b": https://github.com/foundation-model-stack/fms-fsdp/blob/65b0ea670fa375bb0f7f6a285e7229bb96ebdd0f/fms_fsdp/config/training.py#L8 but it is not in the supported wh…

htang2012 updated 4 months ago
2
pytorch/torchtune #1412

Full finetune recipe not working with FSDP2 CPU offload

As in the title.. I spent a bit of time debugging it but haven't figured out the cause yet. E.g. running ``` tune run --nproc_per_node 2 full_finetune_distributed --config llama2/7B_full fsdp_cpu_…

ebsmothers updated 1 month ago
10
facebookresearch/audiocraft #450

FSDP broken on newer versions of pytorch

This commit: https://github.com/pytorch/pytorch/commit/a8329676273ac12f1fadfbcdd19c500d84998345 Released in torch 2.1.0 breaks this https://github.com/facebookresearch/audiocraft/blob/main/audiocraft…

yocontra updated 5 months ago
4
BAAI-DCAI/M3D #23

The strange loss from the second step during training

Hello, I found a strange loss during training as follow. ![image](https://github.com/user-attachments/assets/3732521d-d4c1-4378-9d7c-247254c068d1) The loss in the first step is normal, but the los…

zhi-xuan-chen updated 1 week ago
2
AnswerDotAI/fsdp_qlora #26

bugs for fine-tune fsdp multinode

![image](https://github.com/AnswerDotAI/fsdp_qlora/assets/77484083/03335c76-e593-4534-9afa-84f16ff05007) how to fix that

batman-do updated 7 months ago
1
pytorch/pytorch #130530

Fail to offload FSDP model weights and optimizer states with…

### 🚀 The feature, motivation and pitch Hi Pytorch maintainers, I am currently engaged in training multiple large language models (LLMs) sequentially on a single GPU machine, utilizing FullShard…

PeterSH6 updated 2 months ago
1

上一页 1...12 13 14 15 16 17 18...100 下一页

1000+ results for fsdp

1000+ results
for fsdp