-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…
-
### 🚀 The feature, motivation and pitch
Is there a plan to add FP8 support for training?
### Alternatives
_No response_
### Additional context
_No response_
-
### 🐛 Describe the bug
I used fsdp+ShardedGradScaler to train my model. Compared with apex. amp+ddp, the precision of my model has decreased.
The ddp is like
```
model, optimizer = amp.initial…
-
您好,我在论文中看到你们在pretrain阶段用32张卡训练。我想请问如何用trainer fsdp实现多节点训练呢。例如我想在2个节点16个A100上训练,应该怎么用trainer实现,模型是会切片分到16个gpu上吗?
-
### 🚀 The feature, motivation and pitch
As of 1.12, only limited shared param support exists for FSDP, i.e., they must be part of the same FSDP unit, user cannot shared parameters if their respective…
-
### 🐛 Describe the bug
The model I want to train is more stably trained with EMA, I want to apply the FSDP to the model so as to train with much larger model sizes, with code like the below:
```py…
-
### 🚀 The feature, motivation and pitch
A good profiling tool appears to be lacking for both DDP and FSDP.
### Alternatives
None.
### Additional context
Something like Horovod Timeline but bette…
-
## Work Items
* Meta-device initialization / `_apply()` methods
- [x] Support initial meta-device initialization using `swap_tensors` path
- [ ] Remove manual padding logic after https://github…
awgu updated
2 months ago
-
Notice: In order to resolve issues more efficiently, please raise issue following the template.
(注意:为了更加高效率解决您遇到的问题,请按照模板提问,补充细节)
## ❓ Questions and Help
How does Funasr export ONNX for pre-tra…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…