cardwing / Codes-for-Lane-Detection

Learning Lightweight Lane Detection CNNs by Self Attention Distillation (ICCV 2019)
MIT License
1.04k stars 333 forks source link

Questions about SAD loss and exist loss #305

Closed garryz94 closed 2 years ago

garryz94 commented 4 years ago

Hi @cardwing I have two questions when training on my own data.

  1. When training within the warmup-epochs, everything went fine. But when I activate SAD loss, model performance went bad immediately!

  2. The exist loss is quite large during the training phase. The exist loss seems to be around 0.4000.

Can you provide some advice for me? Thanks!

garryz94 commented 4 years ago

FYI, I am using ERFNet-SAD-Pytorch now

cardwing commented 4 years ago
  1. What is the coefficient of your SAD loss? In normal cases, the SAD loss (multiplied by the loss coefficient) should be much smaller than the main task loss. Maybe you can gradually increase the coefficient of the SAD loss during training since the effect of the SAD loss hinges on the quality of the model's attention maps.
  2. If the main task loss is smaller than the lane existence loss, you should assign a small coefficient to the existence loss.
garryz94 commented 4 years ago
  1. What is the coefficient of your SAD loss? In normal cases, the SAD loss (multiplied by the loss coefficient) should be much smaller than the main task loss. Maybe you can gradually increase the coefficient of the SAD loss during training since the effect of the SAD loss hinges on the quality of the model's attention maps.
  2. If the main task loss is smaller than the lane existence loss, you should assign a small coefficient to the existence loss.
  1. I set sad loss coefficient to 0.1 now, it is more stable.

  2. My main task loss is much smaller than exist loss (0.05 vs 0.40). I will try your advice

  3. Another surprising fact: when I fine-tune on my custom data with your provided pretrained model, SAD seemed to be less useful. Why dose this happen?

cardwing commented 4 years ago

Maybe you can retrain your model with the SAD loss instead of finetuning. Besides, you can check if the attention maps of early layers indeed benefit from the guidance of deep layers after using the SAD loss.

garryz94 commented 4 years ago

My custom dataset only has less than 10k images, so I have to fine-tune instead of training from scratch to avoid over-fitting. As for ERFnet, where is the position of SAD? Maybe I put it in a wrong position

cardwing commented 4 years ago

I see. In the normal case, the attention maps of block 2 (layer 7) should learn from these of block 3 (layer 16) and the learning process is similar for block 1 (layer 1) and block 2. Current codes for ERFNet do not contain the SAD part.

garryz94 commented 4 years ago

I see. In the normal case, the attention maps of block 2 (layer 7) should learn from these of block 3 (layer 16) and the learning process is similar for block 1 (layer 1) and block 2. Current codes for ERFNet do not contain the SAD part.

As I see it, Layer 1 is the first DownsamplerBlock. Layer 7 is the non_bottleneck_1d block before the last DownsamplerBlock. Layer 16 is the last non_bottleneck_1d block before the output_conv. Am I right?

garryz94 commented 4 years ago

Another fun fact: when I train Deeplabv3 on CULane, its performance is much behind some simpler networks, such as ENet, ERFNet. And when I activate SAD on Deeplab, its performance drops rapidly. Why would this happen?

cardwing commented 4 years ago

Yes. And this is weird. DeepLabV3 has much more parameters than ENet and it should at least achieve comparable performance with ENet. It seems that DeepLabV3 has not been trained well. The problem may lie in the network architecture or the training stage.

garryz94 commented 4 years ago

Indeed, ERFNet without SAD can achieve the best performance on my custom dataset, compared with ERFNet-SAD, Deeplabv3 and Deeplabv3-SAD. It is so weird. Maybe it's a kind of overfitting because the amount of my custom data is quite small (about 7000 for training) Anyway, thanks for your great job and your quick reply!

nisha1729 commented 4 years ago

FYI, I am using ERFNet-SAD-Pytorch now

Hey, could you tell me which repo this is? I'm also trying to use SAD with ERFNet. Did you write your own SAD code?

cardwing commented 4 years ago

The PyTorch code does not contain the SAD part. You need to write it by yourself.

wmuron commented 4 years ago

@cardwing Could you please clarify or at least point to existing implementation of how the attention masks used for SAD are calculated? I have trouble interpreting the sentence from paper (https://arxiv.org/abs/1908.00821) saying "Following [24], we also perform spatial softmax operation Φ G2sum". If I understand correctly the result of G2sum is of size 1xHxW, so applying spatial (element-wise) softmax would not make any sense, right? Thanks for help!

cardwing commented 4 years ago

Spatial softmax is performed on the HxW dimension to normalize the pixel values. Please refer to previous issues for more details.

noobliang commented 3 years ago

Hi @cardwing I have two questions when training on my own data.

  1. When training within the warmup-epochs, everything went fine. But when I activate SAD loss, model performance went bad immediately!
  2. The exist loss is quite large during the training phase. The exist loss seems to be around 0.4000.

Can you provide some advice for me? Thanks! Hi, @garryz94 , I have some questions about SAD-MSEloss , could you help me,please, thanks. My SAD- MSELoss is very big, about 300+. The MSELoss calculate the loss of attention map and target , and the value in attention map are 0~1 after softmax while values of segmentation label are like 0,1,2,3,4. if I misunderstand the code ? Can you provide some advice for me? Thanks!!

garryz94 commented 3 years ago

Hi @cardwing I have two questions when training on my own data.

  1. When training within the warmup-epochs, everything went fine. But when I activate SAD loss, model performance went bad immediately!
  2. The exist loss is quite large during the training phase. The exist loss seems to be around 0.4000.

Can you provide some advice for me? Thanks! Hi, @garryz94 , I have some questions about SAD-MSEloss , could you help me,please, thanks. My SAD- MSELoss is very big, about 300+. The MSELoss calculate the loss of attention map and target , and the value in attention map are 0~1 after softmax while values of segmentation label are like 0,1,2,3,4. if I misunderstand the code ? Can you provide some advice for me? Thanks!!

hi @noobliang , I set reduction="mean" when calculate SAD-MSEloss, so the loss is quite small.

garryz94 commented 3 years ago

FYI, I am using ERFNet-SAD-Pytorch now

Hey, could you tell me which repo this is? I'm also trying to use SAD with ERFNet. Did you write your own SAD code?

@nisha1729 sorry for the late reply. I use this repo Codes-for-Lane-Detection/ERFNet-CULane-PyTorch/ and you need to write your own codes to calculate SAD loss.

noobliang commented 3 years ago

Hi @cardwing I have two questions when training on my own data.

  1. When training within the warmup-epochs, everything went fine. But when I activate SAD loss, model performance went bad immediately!
  2. The exist loss is quite large during the training phase. The exist loss seems to be around 0.4000.

Can you provide some advice for me? Thanks! Hi, @garryz94 , I have some questions about SAD-MSEloss , could you help me,please, thanks. My SAD- MSELoss is very big, about 300+. The MSELoss calculate the loss of attention map and target , and the value in attention map are 0~1 after softmax while values of segmentation label are like 0,1,2,3,4. if I misunderstand the code ? Can you provide some advice for me? Thanks!!

hi @noobliang , I set reduction="mean" when calculate SAD-MSEloss, so the loss is quite small.

thank you for your reply and advise. But I found SAD little improvement for accuracy。 Do you have any improvement?I don't know where I went wrong