-
When using Zero-3, what is the equivalent of torch `FSDP`'s [auto-wrap policy](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.FullyShardedDataParallel)?
This policy lets users s…
-
Hi,
In the inference scripts, I see that there is no option to perform inference with FSDP.
Is `model.generate` not recommended when it is wrapped in FSDP? Or DDP?
Thanks
-
### Bug description
It is expected that on a single GPU, DDP and Deepspeed strategies (i.e. `deepspeed_stage_1`, `deepspeed_stage_2` and so on) should give the exact same loss values (if seed is fi…
-
https://github.com/Alpha-VLLM/Lumina-mGPT/blob/c8e180aa20f0a5977bf168424f30aa2be58fad94/lumina_mgpt/model/modeling_xllmx_chameleon.py#L50
The mask should be calculated using the shifted labels (lab…
-
### System Info
**Base line**. For single instance of p4de.24xlarge (640GB GPU, 1000 GB CPU), i am able to use Q(4-bit)LoRA to train a large model wit size close to 300B. `Device_map` is set as `au…
-
I checked this [issue](https://github.com/EleutherAI/lm-evaluation-harness/issues/714#top) has similar problem I have, however using the latest main branch doesn't solve the problem!
## Model:
- F…
-
Liger (Linkedin GPU Efficient Runtime) Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces mem…
-
### 🐛 Describe the bug
When using 2D parallelism of FSDP + TP, I found DCP hangs if I set `full_state_dict=True`. The reason I set `full_state_dict=True` is HuggingFace Trainer needs to save full s…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Current Behavior
执行ds_train_finetune.sh
root@074e2a33256d:~/ChatGLM-6B/ptuning# sh ds_train_finetune.…
-
### 🐛 bug 说明
命令:accelerate launch python scripts/train_m3e.py /path/to/model_base/ /path/to/dataset/
报错
mnt/cache/zhaofufangchen/anaconda3/envs/m3e/lib/python3.10/site-packages/ac │
│ celerate/d…