Training with ddp doesnt work hangs at init_process_group

🔴 If you have installed AllTalk in a custom Python environment, I will only be able to provide limited assistance/support. AllTalk draws on a variety of scripts and libraries that are not written or managed by myself, and they may fail, error or give strange results in custom built python environments.

🔴 Please generate a diagnostics report and upload the "diagnostics.log" as this helps me understand your configuration.

https://github.com/erew123/alltalk_tts/tree/main?#-how-to-make-a-diagnostics-report-file

Installed on docker from Dockerfile + docker-compose up -d 4x nvidia 16gb v100 sxm2 192gb ram 44-core intel xeon 2699-v4

Describe the bug A clear and concise description of what the bug is.

Upon setting use_ddp=True in TrainerArgs, training hangs right at init_process_group from distributed.py

To Reproduce Steps to reproduce the behaviour: fresh docker as is, add use_ddp=True and run the training

Screenshots If applicable, add screenshots to help explain your problem.

Text/logs If applicable, copy/paste in your logs here from the console. it just hangs at using DDP, no error or even use of cpu gpu or memory

Desktop (please complete the following information): AllTalk was updated: [approx. date] 01 july 2024 Custom Python environment: [yes/no give details if yes] no Text-generation-webUI was updated: [approx. date]

Additional context Add any other context about the problem here.

erew123 / alltalk_tts

Training with ddp doesnt work hangs at init_process_group #305