fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/accelerate #3250

Accelerate + FSDP plugin hang on after model save intermedia…

### System Info ```Shell - `Accelerate` version: 0.33.0 - `accelerate` bash location: /miniconda3/envs/SDXL/bin/accelerate - Python version: 3.10.14 - Numpy version: 1.24.4 - PyTorch version (…

leeruibin updated 1 week ago
1
huggingface/transformers #34530

Inference with FSDP during training affects checkpoints

### System Info Output from `transformers-cli env`: ``` - `transformers` version: 4.45.2 - Platform: Linux-6.1.0-21-cloud-amd64-x86_64-with-glibc2.36 - Python version: 3.12.5 - Huggingfa…

pandrei7 updated 3 weeks ago
5
axolotl-ai-cloud/axolotl #1776

FSDP "zero-2" checkpoints don't save

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ### Exp…

winglian updated 4 days ago
1
pytorch/pytorch #141237

test_fsdp_tp_integration fails for non-power-of 2 GPUs

### 🐛 Describe the bug Running the `test_fsdp_tp_integration` with a number of GPUs that is (likely) not a power of 2 fails with e.g.: ``` torch.testing._internal.common_distributed: [ERROR] File…

Flamefire updated 5 days ago
1
huggingface/accelerate #2038

[FSDP] support activation offloading with FSDP

Support whole model activation offloading with FSDP - working in conjunction with activation checkpointing - via https://github.com/pytorch/pytorch/blob/e9ebda29d87ce0916ab08c06ab26fd3766a870e5/to…

shijie-wu updated 2 months ago
4
axolotl-ai-cloud/axolotl #1838

ORPO results in `Cannot flatten integer dtype tensors`

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ### Exp…

maziyarpanahi updated 1 week ago
5
tianweiy/DMD2 #52

don't support both options fsdp and gradient_checkpointing …

# assert not (train_args.fsdp and train_args.gradient_checkpointing), "currently, we don't support both options. open an issue for details." why??

yanmj0601 updated 5 days ago
2
instructlab/instructlab #2614

Add coverage for LoRA training to E2E CI

We recently had an incident where there was an accidental temporary regression with LoRA training due to differences in DeepSpeed and FSDP support - we want to add this type of training to the E2E cov…

nathan-weinberg updated 5 days ago
4
pytorch/ao #1185

Unable to save checkpoints when Use low bit optimizers with …

only occur when using 8 bit adam with FSDP1 i run into: FSDP config param_dtype: bf16 reduce_dtype: fp32 ``` Traceback (most recent call last): File "", line 198, in _run_mo…

nighting0le01 updated 1 month ago
3
Lightning-AI/pytorch-lightning #20312

Model Checkpointing + FSDP causes Cuda OOM

### Bug description I'm using FSDP and model checkpointing (default settings for both). My model has 254 million parameters. I'm not sure why but when I run Trainer.fit() it will successfully run t…

profPlum updated 1 month ago
1

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for fsdp

1000+ results
for fsdp