Luo-Z13 / pointobb

[CVPR2024] PointOBB: Learning Oriented Object Detection via Single Point Supervision
MIT License
44 stars 3 forks source link

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) #4

Closed xyh1108 closed 6 months ago

xyh1108 commented 6 months ago

Hello, I encountered this error when using multi-card training: in the seventh round, there was not enough video memory, but the first six rounds of training were normal, what is the reason for this? 4YHM Y7A4UJ@7R7 8_@4J2B MT%BI}7`K1UT0)1DF2~(37E

Luo-Z13 commented 6 months ago

Hello, I encountered this error when using multi-card training: in the seventh round, there was not enough video memory, but the first six rounds of training were normal, what is the reason for this? 4YHM Y7A4UJ@7R7 8_@4J2B MT%BI}7`K1UT0)1DF2~(37E

This might be a memory leak caused by a version issue, please check if your pytorch version is compatible with the mmcv version. For more details, you can refer to https://github.com/ucas-vg/P2BNet/issues/13.