Open ayushi-3536 opened 1 year ago
Hi,
I was training the network with validation on coco dataset using 8 gpus on a single node. It seems like the network hangs while using reduce_tensor method inside validate_cls(). Is there a solution known for this issue?
Hi @ayushi-3536
I haven't encountered this issue.
Could you please provide your PyTorch version?
Maybe you could please try run validation first.
Hi,
I was training the network with validation on coco dataset using 8 gpus on a single node. It seems like the network hangs while using reduce_tensor method inside validate_cls(). Is there a solution known for this issue?