fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/torchtitan #619

Question about torch.compile has better throughput with 128-…

Thank you for publishing the paper. I hope to get your answers to the following questions.： Normally, the training speed will decline as the number of GPUs increases. However, in the paper, with the …

dz1iang updated 15 hours ago
2
artidoro/qlora #6

Compatibility with Deepspeed, Fairscale, or Torch zero-redun…

Wonderful work! May I know the compatibility with ZeRO mechanism? E.g., Torch redundancy optimizer, deepspeed zero-1 to zero-3, and fairscale FSDP. Becaused I noticed that QLoRA relies on particula…

SparkJiao updated 1 year ago
1
pytorch/pytorch #101335

fsdp training with the seq2seqTranier module gets stuck duri…

### 🐛 Describe the bug when trying to finetune flan-t5 large with the Seq2seqTrainer module, and also passing fsdp_transformer_layer_cls_to_wrap="T5Block" and fsdp="full_shard auto_wrap", I got at fi…

lovodkin93 updated 1 year ago
2
pytorch/pytorch #96161

[torchdistx] Future of the large model initialization

For deep learning, when the model is large, model creation and initialization on host device will require tremendous time and sometimes causes host OOM. The existing [torchdistx](https://github.com/py…

YangFei1990 updated 1 year ago
8
facebookresearch/fairseq #3353

FSDP fails to work together with activation checkpointing un…

## 🐛 Bug Got RuntimeError when training transformer from scratch under `translation_multi_simple_epoch` task with fully sharded data parallel (FSDP). ### To Reproduce Steps to reproduce the b…

thpun updated 3 years ago
4
OpenGVLab/LLaMA-Adapter #17

what about gpuduring training? I use 16G*4, batch size to 1,…

pengwei-iie updated 1 year ago
2
pytorch/xla #3811

Autograd discrepancy in `nn.Linear` (`torch.nn.functional.li…

## 🐛 Bug There seems to be a discrepancy (in addition to https://github.com/pytorch/xla/issues/3718) in how `torch.nn.Linear` (`torch.nn.functional.linear`) is implemented and dispatched between th…

ronghanghu updated 1 year ago
9
tatsu-lab/stanford_alpaca #73

finetuning on 3090, is it possible?

Is it possible to finetune the 7B model using 8*3090? I had set: --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ but still got OOM: torch.cuda.OutOfMemoryError: C…

yfliao updated 1 year ago
2
pytorch/pytorch #109100

FSDP do not support `ignored_parameters` when `auto_wrap_pol…

### 🐛 Describe the bug I have a model which contains some params need to be ignored ( else flat_param will raise an error), the construction code is like: ``` not_trainable = [] …

KimmiShi updated 1 year ago
4
eric-mitchell/direct-preference-optimization #77

Weird logits and model starts degeneration while training DP…

Recently, I have experimented DPO training for Vietnamese. I start with a strong SFT model, which is [vinai/PhoGPT-4B-Chat](https://huggingface.co/vinai/PhoGPT-4B-Chat), and follow the method describe…

DungNasSa10 updated 4 months ago
2

上一页 1...55 56 57 58 59 60 61...100 下一页

1000+ results for fsdp

1000+ results
for fsdp