-
![image](https://user-images.githubusercontent.com/20476674/212067731-50506295-9e27-41f3-ab25-558ade9e5fbb.png)
It seems that the 32G GPU is not enough. How large memory a GPU is needed for normal op…
-
### System Info
transformers: v4.45.0 and up (any of v4.45.0 / v4.45.1 / v4.45.2)
accelerate: v1.0.1 (same result on v0.34.2)
### Who can help?
trainer experts: @muellerzr @SunMarc
accelerate exp…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…
-
I noticed that when compiling a small microbenchmark (with inductor warm caching), E2E compile times were ~4s with cuda tensors and ~15s with cpu tensors. It looks like the majority of the extra time …
-
The original run.py saves the model in pytorch_model.bin, which cannot be loaded directly using the code provided in this repository. After replacing line 422 `trainer.save_model()` in training/run.py…
-
### System Info
transformers: '4.45.1'
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### 🐛 Describe the bug
I have fine-tuned `Llama-3.2-11B-Vision-Instruct` fo…
-
I added a 4-bit load after the command LoRA training with ZeRO-3 on two or more GPUs to achieve a mix of QLoRA and ZeRO-3. But the program encountered the following error:
RuntimeError: expected ther…
-
hi authors,
I am curious about the performance of the model on waymo dataset, but this was not mentioned in the paper. May I ask if you have conducted any relevant experiments and what were the res…
-
Thank you very much for the work you have brought, which is very helpful for those of us with fewer training resources. I am a newcomer to the field of NLP and am not very familiar with training frame…
-
Hi everyone!
I want to retrain ESM3 in distributed way. How can I shard ESM3 model across GPUs?
Thanks:)