Question about the dummy tensor when DDP caused unuse parameter error

MIC-DKFZ / MedNeXt

[MICCAI 2023] MedNeXt is a fully ConvNeXt architecture for 3D medical image segmentation.

https://arxiv.org/pdf/2303.09975

Apache License 2.0

345 stars 26 forks source link

Question about the dummy tensor when DDP caused unuse parameter error #27

Closed guanjinquan closed 3 months ago

guanjinquan commented 4 months ago

Code in: https://github.com/MIC-DKFZ/MedNeXt/blob/c5ed3f38b56d58c80581c75fea856865f42ddb75/nnunet_mednext/network_architecture/mednextv1/MedNextV1.py#L267

# Used to fix PyTorch checkpointing bug
self.dummy_tensor = nn.Parameter(torch.tensor([1.]), requires_grad=True)

But when I try to use DDP to train the MedNeXt, it throw an exception about the unuse parameter: dummy tensor.

How can I solve this problem?

guanjinquan commented 4 months ago

I solve this problem by a rude manner:

def forward(self, x):
        x = self.stem(x) * self.dummy_tensor

I hope that this method would not damage the excellent performance of MedNeXt.

saikat-roy commented 3 months ago

Hi, I'm sorry but I've not used nnUNet v1 in DDP and am not really familiar with it's usage. I'm glad you solved the problem. I know of other teams that have used MedNeXt in DDP in nnUNetv2 with good results so it should work in principle.