fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

instructlab/instructlab #2218

[Epic] Add Gaudi support to InstructLab CLI, eval, and train…

**Feature Overview (aka. Goal Summary)** Implement Intel Gaudi support in InstructLab project, so Gaudi 2 and Gaudi 3 can be used for SDG, evaluation, and training. **Goals (aka. expected user out…

ktam3 updated 1 week ago
1
pytorch/pytorch #117742

Modifying parameters of FSDP-wrapped module by hand without …

# Modifying parameters of FSDP-wrapped module by hand without summon_full_params context ## Issue description I am training a large language module using FSDP. I want to store EMA weights wh…

blunt-octopus updated 1 month ago
4
pytorch/pytorch #124019

[FSDP+TP] RuntimeError: 'weight' must be 2-D

### 🐛 Describe the bug Hi, I've changed pytorch's FSDP+TP example to HF-T5 model and run on 3 nodes with 2 GPUs (total 6 GPUs) ### commands `NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT NCCL_IB_CUDA_SUPP…

taekyounghan updated 4 weeks ago
6
Lightning-AI/lightning-thunder #895

Models trained with FSDP + Thunder doesn't work with litgpt …

I was able to train Llama3-8b model with Thunder for a few steps and then save it. However when I try to use later `litgpt generate` or `litgpt chat` with the saved checkpoint I get an error about si…

mpatel31415 updated 1 month ago
5
Lightning-AI/pytorch-lightning #18336

Investigate FSDP + CPU Offload performance in Trainer

### Bug description When writing the new FSDP guide for Trainer in #18326, I got suspiciously slow iteration speed when enabling CPU offload (see https://github.com/Lightning-AI/lightning/pull/18326#…

awaelchli updated 4 months ago
2
Lightning-AI/lightning-thunder #309

[FSDP] Support gradient clipping by norm

## 🚀 Feature ### Pitch Port https://github.com/pytorch/pytorch/blob/c4a157086482899f0640d03292e5d2c9a6a3db68/torch/distributed/fsdp/fully_sharded_data_parallel.py#L1069-L1194 to work with Thunde…

carmocca updated 5 months ago
1
facebookresearch/ParlAI #3753

Add support for Zero3 FSDP

In #3740, we added support for FullyShardedDataParallel, but limited implementation to that of Zero2, not Zero3. Zero3 results in substantial decreases of memory usage compared with Zero2 while bringi…

stephenroller updated 3 years ago
3
huggingface/accelerate #1620

ValueError: FlatParameter requires uniform dtype but got tor…

### System Info ```Shell Please see https://github.com/huggingface/peft/issues/484#issue-1718704717 ``` ### Information - [ ] The official example scripts - [X] My own modified scripts ### Tasks…

JamesDConley updated 7 hours ago
30
pytorch/ao #704

Question: How to use Float8InferenceLinear with FSDP1/2?

Hey Team, I'm trying to use FSDP1/2 with Float8InferenceLinear but seems have some issues (with torch 2.3.1+cu118). Do you suggestion to bump to higher version of torch and have a try or maybe use …

qingquansong updated 1 month ago
15
pytorch/pytorch #118520

[FSDP] High CPU RAM usage with FSDP sharded checkpointing

### 🐛 Describe the bug When using FSDP sharded checkpointing, I am seeing unusually high CPU RAM usage. Specifically, let's say I am training a 7B parameter model and checkpoints are saved in `floa…

konstantinjdobler updated 8 months ago
2

上一页 1...16 17 18 19 20 21 22...100 下一页

1000+ results for fsdp

1000+ results
for fsdp