-
### 🚀 The feature, motivation and pitch
The ask is whether we can expose a way to construct a 1D device mesh from an existing process group.
For context, this is to ensure interoperability of per-…
wz337 updated
9 months ago
-
## Proposed refactor
Flatten the Strategy inheritance:
Part of #10416
### Motivation
Reduce coupling between strategies, reduce unintentional overrides/inheritance and avoid silent failures
…
-
For single GPU training, every time I run the script, I have to "export SLURM_LOCALID=0", "export SLURM_PROCID=0" and "export SLURM_NNODES=1" before I start the training successfully. My question is f…
-
-
我的训练集数据量很大,有上百万,直接读取训练会OOM,所以使用streaming模式读取数据,但是发现训练速度很慢。
发现gpu的利用率很低
cpu直接被打满了
训练参数
```
SftArguments(train_type='sft', model_type='internvl2-8b', model_revision='master', full_deter…
-
Thanks for your reply and advice in #105! I did more experiments. I am trying to fine-tune in a real ur3 robot. Some key informations:
- trained on one task: Move the black box right (randomized …
-
### 🐛 Describe the bug
I discovered that FSDP doesn't call `.to()` on submodules that have no parameters. This seems odd, since other modules get moved automatically even if they are not wrapped ex…
-
Please fix the following issues.
First, make sure to install the required tools:
```
pip3 install pydocstyle
```
```
pip3 install ruff
```
Then complete the followings steps:
1. Run `pydocst…
-
### System Info
transformers: '4.45.1'
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### 🐛 Describe the bug
I have fine-tuned `Llama-3.2-11B-Vision-Instruct` fo…
-
### 🐛 Describe the bug
Create simple distributed model
Wrapper model with FSDP.
Using stateful optimizer ala Adam(W) run without CPUoffload and profile/time.
Then run with CPUOffload and see th…