-
### Bug description
I am using the default configs, code and data to train a model within BioNeMo framework. The timeout occurs at the middle of the training.
### What version are you seeing the p…
-
I'm working on upgrading my project to Meteor 3.0-rc.2.
I've been adding in one package at a time and have hit a snag with meteor-fast-render. If I remove `communitypackages:fast-render@4.0.9` the…
-
使用两张卡训练的时候,一直卡在Load backbone weights,差不多要三四分钟才能继续
-
### System Info
- `transformers` version: 4.40.1
- Platform: Linux-5.10.214-202.855.amzn2.x86_64-x86_64-with-glibc2.35
- Python version: 3.10.14
- Huggingface_hub version: 0.23.0
- Safetensors …
-
-
Thanks for the great repo
i have two questions about training the models (specifically WizardCoder):
1. have you tried training with QLoRa, and not just LoRa ? are you considering adding it to t…
mrT23 updated
6 months ago
-
Thank you for your excellent work. You used a single V100 GPU for training. Will the programme support distributed training? We are trying to use multiple 4090 GPUs on the same machine to repeat the e…
-
### Describe the bug
Hello, training XTTSv2 from Coqui TTS leads to weird training lags with using DDP
x6 RTX a6000 and 512GB RAM
Here is monitoring GPU load graph. Purple - gpu0, green - gpu1 (a…
-
Thanks for your code!
Could you share the scripts about the DDP Training?
-
正常编译完,启动ocs2_legged_robot_ros下的legged_robot_ddp.launch.py时遇到报错:
![82a1718b08ca3417cfb21707ca68ffb](https://github.com/user-attachments/assets/d3cf155c-5610-4cdb-b7cc-7e2ffe4e5067)
是legged_robot_ta…