-
I use the dmoe in deepspeed or fsdp. i find in the begining, the memory cost is about 33g. As the number of training increases, the occupied video memory increases a little bit and finally exceeds 80g…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
Platform: Kaggle 2xT4
- `llamafactory` version: 0.8.4.dev0
- OS: Linux-5.15.154+-x86_64-with-glib…
-
## 🐛 Bug
If you try train a model with fully_sharded backend and use layer drop, the model training will hang. Each individual layer was also wrapped with fsdp in my particular case. It will be gre…
-
I'm trying what looks like the "Hello World" of this repo: Running the basic training on a Runpod community cloud `2 x RTX 4090, (128 vCPU 125 GB RAM)` configuration. Normally I'd play around with thi…
Pugio updated
5 months ago
-
For currently training a speculator using the specu-train branch, getting OOM error when trying to load a checkpoint in HuggingFace format. The model_type is "gpt_megatron". The script works fine for…
-
I haven't found a good multi-node best practice for FSDP, have you tried it? Thank you in advance. :)
-
Bumps [torch](https://github.com/pytorch/pytorch) from 2.2.0 to 2.2.1.
Release notes
Sourced from torch's releases.
PyTorch 2.2.1 Release, bug fix release
This release is meant to fix the following …
-
I have been trying to finetune LLAMA, that on 7B size and 8 v100 GPUs takes longer than a day on the original `lora.py` script, this seemed wrong as training time of few hours often is seen. To remedy…
-
### 🐛 Describe the bug
Instantiate a model, wrap the model in FSDP with an autowrap policy, then wrap that FSDP-wrapped model in torch.compile, then try to checkpoint, and you will get a stack trac…
-
## 🐛 Bug
After training Llama-3-8b on 8 A100 for 10 iterations with eager mode I printed the model weights:
```
torch_dist.barrier()
weights_after_training = benchmark.model.lm_head.weight[:10].…