Open matteosodano opened 3 years ago
We never faced this problem. The factor for random scaling is chosen between 1.0 and 1.4. So it's quite unlikely to pick a batch full of void. Which dataset and batchsize do you use for training?
I was using the SUNRGBD dataset with default parameters (thus, batch_size = 8). It was the very first run I did with the code, so I did not modify anything. I thought about the cropping because it happened at a random epoch, so it should not be a problem of corrupted image or similar.
I tried a training with the SUNRGBD dataset, and got the error
Loss is None
. Inspecting the code, it seems like it can only be caused by the loss function inESANet/src/utils.py
, and specifically here:number_of_pixels_per_class = torch.bincount(targets.flatten().type(self.dtype), minlength=self.num_classes) divisor_weighted_pixel_sum = torch.sum(number_of_pixels_per_class[1:] * self.weight) # without void losses.append(torch.sum(loss_all) / divisor_weighted_pixel_sum)
My assumption is that
divisor_weighted_pixel_sum
can be 0 with some very 'unlucky' random cropping.The following modification seems to solve the problem:
divisor_weighted_pixel_sum = torch.sum(number_of_pixels_per_class[1:] * self.weight).clamp(min=1e-5) # without void
Let me know if you ever experienced something similar, or if you have a better fix.