-
I tried using a FSDP config like this for accelerate (taken from https://github.com/kohya-ss/sd-scripts/issues/1480#issuecomment-2301283660) to finetune SDXL. The UI is bmaltais/kohya_ss
```python
…
ljleb updated
2 weeks ago
-
Hello!
I am currently trying to fine tune using lora a Llama 3.1 70B Nemotron Instruct LLM by tweaking a bit the Llama 3.1 70B lora configs.
According to the memory stats required by torchtune, …
-
1.对于reranker训练,llm类模型默认使用deepspeed训练,bert类模型默认使用fsdp训练,请问bert类模型如何使用deepspeed训练呢,能否增加使用案例?
2.或者对于embedding训练,如何使用deepspeed训练呢,是否也可以增加使用案例呢?谢谢
-
On `PP + FSDP` and `PP + TP + FSDP`:
- Is there any documentation on how these different parallelisms compose?
- What are the largest training runs these strategies have been tested on?
- Are there…
-
Given there are so many LLM-based models on top of MTEB benchmark nowadays, is there a canonical way to train with FSDP now? I'm trying to explore along this direction, but I just want to ask if there…
-
### Bug description
I have a sharded checkpoint which was saved via `trainer.save_checkpoint("/path/to/cp/dir/", weights_only=False` which I am trying to load during test via `trainer.test(dataloade…
-
It is useful to shard optimizer state across devices (to save significant memory). This reflects current practice. We want to support it.
* We want to switch from no sharding to naive model parameter…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports.
### Exp…
-
We want to verify that FSDP works in the following scenarios:
- [x] #203
- [x] #204
- [x] #205
- [x] #206
- [x] #207
- [x] #208
-
### System Info
- Platform: Linux-6.8.0-47-generic-x86_64-with-glibc2.35
- Python version: 3.10.15
- PyTorch version: 2.4.0
- CUDA device(s): NVIDIA H100 80GB HBM3, NVIDIA H100 80GB HBM3, NVIDIA H…