fsdp Search Results - Githubissues

1000+ results
for fsdp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

facebookresearch/jepa #51

FSDP Support

This is a bit of a technical challenge and/or question. Both I-JEPA and V-JEPA use DDP and not FSDP. This puts an inherent cap on the size of models that are used, the size of the GPU memory. I'm …

andrew-bydlon updated 5 months ago
2
pytorch/torchtune #1487

How to disable Checkpointing for Full tuning or PEFT runs?

I am trying to run single GPU to multinode distributed fine tuning for Llama3-70B and Llama3 8B Models. Below is my training configuration: SFT (Llama3 8B & 70B) Epochs: 3 Gradient Accumulatio…

premmotgi updated 1 month ago
4
linkedin/Liger-Kernel #48

Unable to use FLCE with FSDP+PEFT+embeddings layers

### 🐛 Describe the bug when trying to train both LoRA layers on the base model and also set modules_to_save on the lora config which makes the embeddings layers trainable (my assumption is it also ap…

winglian updated 1 month ago
2
huggingface/transformers #33376

Weird text encoder NaNs specifically for FSDP + multi GPU

### System Info - `transformers` version: 4.45.0.dev0 - Platform: Linux-5.15.0-1027-gcp-x86_64-with-glibc2.31 - Python version: 3.9.19 - Huggingface_hub version: 0.24.5 - Safetensors version: 0…

christopher-beckham updated 6 days ago
2
huggingface/transformers #30811

Cannot restore FSDP checkpoint with LOCAL_STATE_DICT

### System Info - `transformers` version: 4.40.1 - Platform: Linux-5.15.148.2-2.cm2-x86_64-with-glibc2.35 - Python version: 3.10.2 - Huggingface_hub version: 0.23.0 - Safetensors version: 0.4.2…

helloworld1 updated 3 days ago
2
huggingface/alignment-handbook #159

FSDP + QDoRA Support

Hi the team, great work! QDoRA seems to be better than QLoRA, refer to [Efficient finetuning of Llama 3 with FSDP QDoRA](https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.html) I wonder w…

iseesaw updated 4 months ago
6
pytorch/pytorch #134739

Compile fails on Flex attention + FSDP

### 🐛 Describe the bug Flex attention on FSDP works without compile, but not with compile. The key error seems to be `ValueError: Pointer argument (at 2) cannot be accessed from Triton (cpu tensor?)`…

platers updated 4 days ago
7
pytorch/pytorch #137129

FSDP2 equivalent of FSDP.summon_full_params contextmanager

### 🚀 The feature, motivation and pitch In FSDP1 there is the `FSDP.summon_full_params` [function](https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.FullyShardedDataParallel.summon_ful…

santhnm2 updated 6 days ago
1
tianweiy/DMD2 #41

train a 4step SDXL got AssertionError: Invalid: ``_post_bac…

Thanks for excellent work! When I try to train a 4step SDXL model.(2 nodes 16 GPUs ) I got an error: `[rank2]: Traceback (most recent call last): [rank2]: File "/mnt/nas/gaohl/project/DMD2-mai…

joelulu updated 1 month ago
8
allenai/OLMo #703

[Quick question]: How do I turn off FSDP?

### ❓ The question quick question, is there any example script and yaml file that turn off FSDP completely? (I want to use DDP) I am running it on a 7B model. I have A100 80GB. I guess this w…

candygocandy updated 1 month ago
1

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for fsdp

1000+ results
for fsdp