fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Modalities/modalities #129

Investigation: Loss spikes when loss developes close to conv…

During the training of 3.6B and 7B with FSDP we experienced a loss spike after the loss as the model was moving towards convergence. Things that we should check in our implementation: - [x] Co…

le1nux updated 3 months ago
1
pytorch/pytorch #111958

torch 2.1 FSDP only some layers might not be working with tr…

I am trying to use the new FSDP feature in torch 2.1 where the require_grad does not need to be uniform across a block. ``` python model = AutoModelForCausalLM.from_pretrained( 'some_lla…

HJiashu updated 4 weeks ago
15
facebookresearch/fairseq #4259

Why the model parameters decreased after using FSDP ?

## ❓ Questions and Help ### Before asking: 1. search the issues. 2. search the docs. #### What is your question? I try to train a 100 layers encoder and 100 layers decoder transformer. Be…

SefaZeng updated 2 years ago
3
Lightning-AI/lightning-thunder #1146

Torch compile support for distributed operations

## 🚀 Feature [Documentation says](https://lightning.ai/docs/pytorch/latest/advanced/compile.html#limitations) that torch compile is not supported over distributed training right now. Since torch co…

AugustDev updated 3 weeks ago
2
pytorch/pytorch #109440

[FSDP] supports QLora finetuning

### 🚀 The feature, motivation and pitch Currently FSDP is rejecting tensor parameters with dtype unit8. is_floating_point() only allows one of the (torch.float64, torch.float32, torch.float16, and …

haochen806 updated 11 months ago
1
hiyouga/LLaMA-Factory #5512

How to train the mm_proj and the LLM part with lora of Qwen2…

### Reminder - [X] I have read the README and searched the existing issues. ### System Info Have installed all the requirements for Qwen2-vl ### Reproduction train_mm_proj_only:True Hello, I wan…

leoozy updated 1 week ago
3
huggingface/candle #2007

How to run inference of a (very) large model across mulitple…

It is mentioned on README that candle supports multi GPU inference, using NCCL under the hood. How can this be implemented ? I wonder if there is any available example to look at.. Also, I know PyT…

jorgeantonio21 updated 2 months ago
4
huggingface/accelerate #2873

Plan to support FSDP2?

FSDP2 provides smaller memory footprint, compatibility with torch compile, and more flexibility due to per param sharding. Does huggingface have plan to support FSDP2? https://github.com/pytorch/to…

ByronHsu updated 2 months ago
8
pacman100/LLM-Workshop #6

Error on save_steps using FSDP

I am currently using the FSDP (Fully Sharded Data Parallelism) approach with the Llama 2 70B model. The training process has begun, but I encounter an error when attempting to save the checkpoint at e…

ghost updated 11 months ago
3
apoorvkh/torchrunx #25

README and documentation

(for later)

apoorvkh updated 2 weeks ago
3

上一页 1...23 24 25 26 27 28 29...100 下一页

1000+ results for fsdp

1000+ results
for fsdp