Unstable performances in validation set and prediction collapse

frmrz commented 3 years ago

Hi everyone, many thanks to @LeeJunHyun for the awesome implementations.

I'm using this repo to segment medical grayscale images on a private dataset. I have a problem in the training of models with the Recurrent blocks that I'm not able to solve, I'd like to know if anyone had the same problem or has any idea how to solve it.

My loss (I use mainly Dice or Jaccard) in the training set behaves as expected with a slow but steady decrease. On the contrary, in the validation set, I have one epoch with good results and the next epoch with all zero masks or very bad predictions, then the same repeats nearly in a constant manner. Moreover, visualizing the prediction I noticed that (also for the "good" epochs) there is a sort of "prediction collapse" where for diverse inputs the model predicts very similar outputs in terms of the shape of the segmented area.

The same problem arises with different batch sizes, LRs, augmentation, pre-processing, and other parameters. I checked the input images and they are coherent on what I want to input. I also used the R2U and Att-R2U models in the "PyTorch-segmentation-models" repository by @qubvel and I have the same problem.

I think that I am missing something in the R2U model because aside from these problems, my results using this repository are very good.

I thank everyone that will interact with this post. Bye!

gaowenjingshitou commented 3 years ago

I met the same problem, did you resolved?

frmrz commented 3 years ago

No, sadly I still have the problem

BaochangZhang commented 3 years ago

wait for answear

BaochangZhang commented 3 years ago

just now, i solved it. I find that the residual block is used in a wrong way. you can use the follow codes:

class RRCNN_block(nn.Module): """ Recurrent Residual Convolutional Neural Network Block """ def init(self, in_ch, out_ch, t=2): super(RRCNN_block, self).init()

    self.RCNN = nn.Sequential(
        Recurrent_block(out_ch, t=t),
        Recurrent_block(out_ch, t=t)
    )
    self.Conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, kernel_size=1, stride=1, padding=0),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True)
    )
    self.activate = nn.ReLU(inplace=True)
    # self.Conv = nn.Conv2d(in_ch, out_ch, kernel_size=1, stride=1, padding=0)

def forward(self, x):
    x1 = self.Conv(x)
    x2 = self.RCNN(x1)
    out = self.activate(x1 + x2)
    return out

frmrz commented 3 years ago

thanks @BaochangZhang, as soon as I will have the time I will give you feedback if it works on my data

FunkyKoki commented 3 years ago

just now, i solved it. I find that the residual block is used in a wrong way. you can use the follow codes:

class RRCNN_block(nn.Module): """ Recurrent Residual Convolutional Neural Network Block """ def init(self, in_ch, out_ch, t=2): super(RRCNN_block, self).init()
    self.RCNN = nn.Sequential(
        Recurrent_block(out_ch, t=t),
        Recurrent_block(out_ch, t=t)
    )
    self.Conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, kernel_size=1, stride=1, padding=0),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True)
    )
    self.activate = nn.ReLU(inplace=True)
    # self.Conv = nn.Conv2d(in_ch, out_ch, kernel_size=1, stride=1, padding=0)

def forward(self, x):
    x1 = self.Conv(x)
    x2 = self.RCNN(x1)
    out = self.activate(x1 + x2)
    return out

Thank you so much! You are so great~

pangda72 commented 2 years ago

Hi everyone, many thanks to @LeeJunHyun for the awesome implementations.

I'm using this repo to segment medical grayscale images on a private dataset. I have a problem in the training of models with the Recurrent blocks that I'm not able to solve, I'd like to know if anyone had the same problem or has any idea how to solve it.

My loss (I use mainly Dice or Jaccard) in the training set behaves as expected with a slow but steady decrease. On the contrary, in the validation set, I have one epoch with good results and the next epoch with all zero masks or very bad predictions, then the same repeats nearly in a constant manner. Moreover, visualizing the prediction I noticed that (also for the "good" epochs) there is a sort of "prediction collapse" where for diverse inputs the model predicts very similar outputs in terms of the shape of the segmented area.

The same problem arises with different batch sizes, LRs, augmentation, pre-processing, and other parameters. I checked the input images and they are coherent on what I want to input. I also used the R2U and Att-R2U models in the "PyTorch-segmentation-models" repository by @qubvel and I have the same problem.

I think that I am missing something in the R2U model because aside from these problems, my results using this repository are very good.

I thank everyone that will interact with this post. Bye!

Hi@frmrz Sorry to bother you.I'm using this repo to segment medical grayscale images on a private dataset too.But I got a problem,I used mainly Dice or Jaccard Loss in the training set ，the loss function will not decrease, and the metric will go to zero.I don't know how to solve this problem, could you give me some help? thanks in advance. Best

frmrz commented 2 years ago

Hi everyone, many thanks to @LeeJunHyun for the awesome implementations. I'm using this repo to segment medical grayscale images on a private dataset. I have a problem in the training of models with the Recurrent blocks that I'm not able to solve, I'd like to know if anyone had the same problem or has any idea how to solve it. My loss (I use mainly Dice or Jaccard) in the training set behaves as expected with a slow but steady decrease. On the contrary, in the validation set, I have one epoch with good results and the next epoch with all zero masks or very bad predictions, then the same repeats nearly in a constant manner. Moreover, visualizing the prediction I noticed that (also for the "good" epochs) there is a sort of "prediction collapse" where for diverse inputs the model predicts very similar outputs in terms of the shape of the segmented area. The same problem arises with different batch sizes, LRs, augmentation, pre-processing, and other parameters. I checked the input images and they are coherent on what I want to input. I also used the R2U and Att-R2U models in the "PyTorch-segmentation-models" repository by @qubvel and I have the same problem. I think that I am missing something in the R2U model because aside from these problems, my results using this repository are very good. I thank everyone that will interact with this post. Bye!

Hi@frmrz Sorry to bother you.I'm using this repo to segment medical grayscale images on a private dataset too.But I got a problem,I used mainly Dice or Jaccard Loss in the training set ，the loss function will not decrease, and the metric will go to zero.I don't know how to solve this problem, could you give me some help? thanks in advance. Best

Hi, have you tried the fix proposed by @BaochangZhang ?

Deng-GuiFeng commented 2 years ago

Hi everyone, many thanks to @LeeJunHyun for the awesome implementations.

I'm using this repo to segment medical grayscale images on a private dataset. I have a problem in the training of models with the Recurrent blocks that I'm not able to solve, I'd like to know if anyone had the same problem or has any idea how to solve it.

My loss (I use mainly Dice or Jaccard) in the training set behaves as expected with a slow but steady decrease. On the contrary, in the validation set, I have one epoch with good results and the next epoch with all zero masks or very bad predictions, then the same repeats nearly in a constant manner. Moreover, visualizing the prediction I noticed that (also for the "good" epochs) there is a sort of "prediction collapse" where for diverse inputs the model predicts very similar outputs in terms of the shape of the segmented area.

The same problem arises with different batch sizes, LRs, augmentation, pre-processing, and other parameters. I checked the input images and they are coherent on what I want to input. I also used the R2U and Att-R2U models in the "PyTorch-segmentation-models" repository by @qubvel and I have the same problem.

I think that I am missing something in the R2U model because aside from these problems, my results using this repository are very good.

I thank everyone that will interact with this post. Bye!

Hi, everyone, I met the same problem, and I am trying the fix proposed by @BaochangZhang and I will be grateful if it works. Now I want to express my view about why this happen. I think it's because the same BatchNorm layer(bn) is used twice in the Recurrent_block, when training, the param in bn will update twice, but only the second param will be saved. So, when you use Model.eval() to test, the model uses the latest param to predict, and the same param will be applied to two bn in Recurrent_block , which is different from training. To solve it, I have tried to use Model.train() with lr=0 to test. In this way, the param in bn is only decided on the data you feed in each batch, and the predict results will be influenced by batchsize and input order. To be honest, I don't have idea how to fix it, as it is not mentioned in paper. And what I said above is only my view, I am not sure whether it's right.

LeeJunHyun / Image_Segmentation

Unstable performances in validation set and prediction collapse #73