Closed garryz94 closed 2 years ago
FYI, I am using ERFNet-SAD-Pytorch now
- What is the coefficient of your SAD loss? In normal cases, the SAD loss (multiplied by the loss coefficient) should be much smaller than the main task loss. Maybe you can gradually increase the coefficient of the SAD loss during training since the effect of the SAD loss hinges on the quality of the model's attention maps.
- If the main task loss is smaller than the lane existence loss, you should assign a small coefficient to the existence loss.
I set sad loss coefficient to 0.1 now, it is more stable.
My main task loss is much smaller than exist loss (0.05 vs 0.40). I will try your advice
Another surprising fact: when I fine-tune on my custom data with your provided pretrained model, SAD seemed to be less useful. Why dose this happen?
Maybe you can retrain your model with the SAD loss instead of finetuning. Besides, you can check if the attention maps of early layers indeed benefit from the guidance of deep layers after using the SAD loss.
My custom dataset only has less than 10k images, so I have to fine-tune instead of training from scratch to avoid over-fitting. As for ERFnet, where is the position of SAD? Maybe I put it in a wrong position
I see. In the normal case, the attention maps of block 2 (layer 7) should learn from these of block 3 (layer 16) and the learning process is similar for block 1 (layer 1) and block 2. Current codes for ERFNet do not contain the SAD part.
I see. In the normal case, the attention maps of block 2 (layer 7) should learn from these of block 3 (layer 16) and the learning process is similar for block 1 (layer 1) and block 2. Current codes for ERFNet do not contain the SAD part.
As I see it, Layer 1 is the first DownsamplerBlock. Layer 7 is the non_bottleneck_1d block before the last DownsamplerBlock. Layer 16 is the last non_bottleneck_1d block before the output_conv. Am I right?
Another fun fact: when I train Deeplabv3 on CULane, its performance is much behind some simpler networks, such as ENet, ERFNet. And when I activate SAD on Deeplab, its performance drops rapidly. Why would this happen?
Yes. And this is weird. DeepLabV3 has much more parameters than ENet and it should at least achieve comparable performance with ENet. It seems that DeepLabV3 has not been trained well. The problem may lie in the network architecture or the training stage.
Indeed, ERFNet without SAD can achieve the best performance on my custom dataset, compared with ERFNet-SAD, Deeplabv3 and Deeplabv3-SAD. It is so weird. Maybe it's a kind of overfitting because the amount of my custom data is quite small (about 7000 for training) Anyway, thanks for your great job and your quick reply!
FYI, I am using ERFNet-SAD-Pytorch now
Hey, could you tell me which repo this is? I'm also trying to use SAD with ERFNet. Did you write your own SAD code?
The PyTorch code does not contain the SAD part. You need to write it by yourself.
@cardwing Could you please clarify or at least point to existing implementation of how the attention masks used for SAD are calculated? I have trouble interpreting the sentence from paper (https://arxiv.org/abs/1908.00821) saying "Following [24], we also perform spatial softmax operation Φ G2sum". If I understand correctly the result of G2sum is of size 1xHxW, so applying spatial (element-wise) softmax would not make any sense, right? Thanks for help!
Spatial softmax is performed on the HxW dimension to normalize the pixel values. Please refer to previous issues for more details.
Hi @cardwing I have two questions when training on my own data.
- When training within the warmup-epochs, everything went fine. But when I activate SAD loss, model performance went bad immediately!
- The exist loss is quite large during the training phase. The exist loss seems to be around 0.4000.
Can you provide some advice for me? Thanks! Hi, @garryz94 , I have some questions about SAD-MSEloss , could you help me,please, thanks. My SAD- MSELoss is very big, about 300+. The MSELoss calculate the loss of attention map and target , and the value in attention map are 0~1 after softmax while values of segmentation label are like 0,1,2,3,4. if I misunderstand the code ? Can you provide some advice for me? Thanks!!
Hi @cardwing I have two questions when training on my own data.
- When training within the warmup-epochs, everything went fine. But when I activate SAD loss, model performance went bad immediately!
- The exist loss is quite large during the training phase. The exist loss seems to be around 0.4000.
Can you provide some advice for me? Thanks! Hi, @garryz94 , I have some questions about SAD-MSEloss , could you help me,please, thanks. My SAD- MSELoss is very big, about 300+. The MSELoss calculate the loss of attention map and target , and the value in attention map are 0~1 after softmax while values of segmentation label are like 0,1,2,3,4. if I misunderstand the code ? Can you provide some advice for me? Thanks!!
hi @noobliang , I set reduction="mean"
when calculate SAD-MSEloss, so the loss is quite small.
FYI, I am using ERFNet-SAD-Pytorch now
Hey, could you tell me which repo this is? I'm also trying to use SAD with ERFNet. Did you write your own SAD code?
@nisha1729 sorry for the late reply. I use this repo Codes-for-Lane-Detection/ERFNet-CULane-PyTorch/ and you need to write your own codes to calculate SAD loss.
Hi @cardwing I have two questions when training on my own data.
- When training within the warmup-epochs, everything went fine. But when I activate SAD loss, model performance went bad immediately!
- The exist loss is quite large during the training phase. The exist loss seems to be around 0.4000.
Can you provide some advice for me? Thanks! Hi, @garryz94 , I have some questions about SAD-MSEloss , could you help me,please, thanks. My SAD- MSELoss is very big, about 300+. The MSELoss calculate the loss of attention map and target , and the value in attention map are 0~1 after softmax while values of segmentation label are like 0,1,2,3,4. if I misunderstand the code ? Can you provide some advice for me? Thanks!!
hi @noobliang , I set
reduction="mean"
when calculate SAD-MSEloss, so the loss is quite small.
thank you for your reply and advise. But I found SAD little improvement for accuracy。 Do you have any improvement?I don't know where I went wrong
Hi @cardwing I have two questions when training on my own data.
When training within the warmup-epochs, everything went fine. But when I activate SAD loss, model performance went bad immediately!
The exist loss is quite large during the training phase. The exist loss seems to be around 0.4000.
Can you provide some advice for me? Thanks!