Beckschen / 3D-TransUNet

This is the official repository for the paper "3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers"
Apache License 2.0
201 stars 13 forks source link

An error:RuntimeError: Function 'SigmoidBackward0' returned nan values in its 0th output. #24

Open xixixihean opened 8 months ago

xixixihean commented 8 months ago

In nnUNetTrainerV2_DDP.py, with autocast(enabled=False): output_act = output_ds[i].sigmoid() if is_sigmoid else softmax_helper(output_ds[i]) # bug occurs here.. There will raise an error:RuntimeError: Function 'SigmoidBackward0' returned nan values in its 0th output. Could you tell me how to solve it,please? Thank you.

TaWald commented 8 months ago

I got the same issue. For me it always happens at around epoch ~120.

Here is my stacktrace:

"/dkfz/cluster/gpu/data/OE0441/t006d/Code/transunet3d/nn_transunet/trainer/nnUNetTrainerV2_DDP.py", line 1039, in run_training
    l = self.run_iteration(self.tr_gen, True)
  File "/dkfz/cluster/gpu/data/OE0441/t006d/Code/transunet3d/nn_transunet/trainer/nnUNetTrainerV2_DDP.py", line 552, in run_iteration
    l = self.compute_loss(output, target, is_max, is_c2f, self.args.is_sigmoid, is_max_hungarian, is_max_ds, point_rend, num_point_rend, no_object_weight)
  File "/dkfz/cluster/gpu/data/OE0441/t006d/Code/transunet3d/nn_transunet/trainer/nnUNetTrainerV2_DDP.py", line 658, in compute_loss
    output_act = output_ds[i].sigmoid() if is_sigmoid else softmax_helper(output_ds[i]) # bug occurs here..
 (Triggered internally at ../torch/csrc/autograd/python_anomaly_mode.cpp:113.)