fsdp Search Results - Githubissues

kohya-ss/sd-scripts #1775

FSDP support

I tried using a FSDP config like this for accelerate (taken from https://github.com/kohya-ss/sd-scripts/issues/1480#issuecomment-2301283660) to finetune SDXL. The UI is bmaltais/kohya_ss ```python …

ljleb updated 2 weeks ago

pytorch/torchtune #2071

CUDA OOM error with supposedly good enough specs according t…

Hello! I am currently trying to fine tune using lora a Llama 3.1 70B Nemotron Instruct LLM by tweaking a bit the Llama 3.1 70B lora configs. According to the memory stats required by torchtune, …

sionhan updated 1 day ago

NLPJCL/RAG-Retrieval #49

fsdp和deepspeed训练模式

1.对于reranker训练，llm类模型默认使用deepspeed训练，bert类模型默认使用fsdp训练，请问bert类模型如何使用deepspeed训练呢，能否增加使用案例？ 2.或者对于embedding训练，如何使用deepspeed训练呢，是否也可以增加使用案例呢？谢谢

liu-yx17 updated 18 hours ago

pytorch/torchtitan #562

Pipeline Parallelism + FSDP

On `PP + FSDP` and `PP + TP + FSDP`: - Is there any documentation on how these different parallelisms compose? - What are the largest training runs these strategies have been tested on? - Are there…

jeromeku updated 2 days ago

UKPLab/sentence-transformers #3023

FSDP Training with Sentence Transformer

Given there are so many LLM-based models on top of MTEB benchmark nowadays, is there a canonical way to train with FSDP now? I'm trying to explore along this direction, but I just want to ask if there…

ShengYun-Peng updated 1 week ago

Lightning-AI/pytorch-lightning #20373

FSDP checkpoint loading fails

### Bug description I have a sharded checkpoint which was saved via `trainer.save_checkpoint("/path/to/cp/dir/", weights_only=False` which I am trying to load during test via `trainer.test(dataloade…

Nilabhra updated 1 week ago

mlcommons/algorithmic-efficiency #796

Support FSDP in PyTorch

It is useful to shard optimizer state across devices (to save significant memory). This reflects current practice. We want to support it. * We want to switch from no sharding to naive model parameter…

priyakasimbeg updated 2 weeks ago

axolotl-ai-cloud/axolotl #2058

Axolotl hanging on bench evals with fsdp

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ### Exp…

bsc001 updated 1 day ago

instructlab/training #202

[FSDP] [EPIC] Determine how well FSDP works on Intel Gaudi a…

We want to verify that FSDP works in the following scenarios: - [x] #203 - [x] #204 - [x] #205 - [x] #206 - [x] #207 - [x] #208

RobotSail updated 5 days ago

huggingface/trl #2307

Using a different `ref_model` from `model` leads to incorrec…

### System Info - Platform: Linux-6.8.0-47-generic-x86_64-with-glibc2.35 - Python version: 3.10.15 - PyTorch version: 2.4.0 - CUDA device(s): NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H…

DarshanDeshpande updated 3 weeks ago

1000+ results for fsdp

1000+ results
for fsdp