Coobiw / MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
349 stars 19 forks source link

Encountering RuntimeError Related to Process Group Initialization on RTX 3090 #1

Closed larrywo closed 8 months ago

larrywo commented 10 months ago

While attempting to train my model on an RTX 3090 GPU, I came across the following error: RuntimeError: Default process group has not been initialized. It seems like init_process_group has not been called.

I'd appreciate it if anyone could provide guidance on resolving this issue.

image

Coobiw commented 10 months ago

I guess you didn't use DDP, which leads to this error because dist.barrier() without multi-process. Just modify the dist_barrier() in runner_base.py to:

if is_dist_avail_and_initialized():
                dist.barrier()

Then it will work! Now, this has been repaired. You can run git pull to get the latest code or refer to the commit: https://github.com/Coobiw/MiniGPT4Qwen/commit/d13f9657614a6be7553c850b7f95b4c31832eeef