Bin-ze / BEVFormer_segmentation_detection

Implemented BEVFormer support for BEV segmentation
Apache License 2.0
98 stars 9 forks source link

loss计算 #3

Closed boyforsky closed 1 year ago

boyforsky commented 1 year ago

您好,我现在遇到两个问题: 1.metrics.py中第43行stat_scores函数原来的代码参数输入有误,但我根据最新的帮助填入参数后结果有误,训练结果中 Divider | pred Crossing | Boundary | mIoU 的值全部相同。 2.无法单独训练seg分支,显示RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by making sure all forward function outputs participate in calculating loss. If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable). 请问怎么解决以上问题,感激不尽。

Bin-ze commented 1 year ago

训练单分支不支持分布式训练,需要单卡训练。训练多分支报上述错误,你需要设置find_unused_parameters=True,你可以全局搜索这个参数并设置为True,它可以解决你上述错误。

boyforsky commented 1 year ago

训练单分支不支持分布式训练,需要单卡训练。训练多分支报上述错误,你需要设置find_unused_parameters=True,你可以全局搜索这个参数并设置为True,它可以解决你上述错误。

问题已解决,非常感谢您的帮助

boyforsky commented 1 year ago

训练单分支单卡多线程训练经常出现RuntimeError: DataLoader worker (pid(s) 69021) exited unexpectedly的错误

Bin-ze commented 1 year ago

看上去像硬件问题,你设置一下num worker啥的再试试发自我的 iPhone在 2023年5月28日,21:18,sky @.***> 写道: 训练单分支单卡多线程训练经常出现RuntimeError: DataLoader worker (pid(s) 69021) exited unexpectedly的错误

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>