Please check that this issue hasn't been reported before.
[X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
Only the main process running on card 0
Current behaviour
When performing distributed training on a single machine with multiple cards, e.g., 2 cards, there are 2 processes spwaned on card 0. And this makes OOM error occur frequently.
BTW, using deepspeed also doesn't work for this issue.
### Possible solution
_No response_
### Which Operating Systems are you using?
- [X] Linux
- [ ] macOS
- [ ] Windows
### Python Version
3.10
### axolotl branch-commit
main
### Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.
Please check that this issue hasn't been reported before.
Expected Behavior
Only the main process running on card 0
Current behaviour
When performing distributed training on a single machine with multiple cards, e.g., 2 cards, there are 2 processes spwaned on card 0. And this makes OOM error occur frequently.
Steps to reproduce
Config yaml