RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one.

fengxfeng0 commented 6 months ago

Thanks for the awesome work! It is really interesting and powerful! But when I reproduce this work in training VOC dataset, it occured an error which is "RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. Since find_unused_parameters=True is enabled, this likely means that not all forward outputs participate in computing loss. You can fix this by making sure all forward function outputs participate in calculating loss."(By the way, I only have a single GPU.) Could you give me some advice please?Thanks for your help!

Wu0409 commented 6 months ago

Hey, thank you for your interest in our project! 😊

The issue you're encountering typically arises within the DDP training environment.

To address your specific situation, I recommend converting the DDP training script into a single-GPU training format. This can be done by eliminating certain DDP-specific initializations and wrappers, such as dist.init_process_group, DistributedDataParallel, and DistributedSampler.

Given that our codebase is built upon PyTorch's native DDP framework, making these modifications should be straightforward and not overly time-consuming.

fengxfeng0 commented 6 months ago

Thank you for your reply！😊 It really help!

Wu0409 / DuPL

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. #1