XiaLiPKU / EMANet

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)
https://xialipku.github.io/publication/expectation-maximization-attention-networks-for-semantic-segmentation/
GNU General Public License v3.0
680 stars 130 forks source link

no grad back propagate to EMAU.conv1? #14

Closed zhaokegg closed 4 years ago

zhaokegg commented 4 years ago

Excuse me, I can not find the grad back to the CONV1. Are there some bugs?

XiaLiPKU commented 4 years ago

Excuse me, I can not find the grad back to the CONV1. Are there some bugs?

No bug here. It is due to https://github.com/XiaLiPKU/EMANet/blob/f7d7b4746104ea62bc5bd3186f6fcb8ea71e3579/network.py#L227. If you comment this line out, then conv1 shall have grad, but the performance may decreace a little. By now, I also don't know why no grad on conv1 is better.

zhaokegg commented 4 years ago

Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.

XiaLiPKU commented 4 years ago

Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.

You can train a model without line https://github.com/XiaLiPKU/EMANet/blob/f7d7b4746104ea62bc5bd3186f6fcb8ea71e3579/network.py#L227. I don't agree with you. Without grad from loss, pytorch don't update its parameter, but it still works in the forward process.

valencebond commented 4 years ago

Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.

You can train a model without line https://github.com/XiaLiPKU/EMANet/blob/f7d7b4746104ea62bc5bd3186f6fcb8ea71e3579/network.py#L227

. I don't agree with you. Without grad from loss, pytorch don't update its parameter, but it still works in the forward process.

but i think if the param not update, conv1 make no sense. Will it work appropriate only using the init param of conv1? by the way, have you tried to test performance of model after removing conv1?

XiaLiPKU commented 4 years ago

Can you provide a model to check? I think that the CONV1 may not be included in your final model. Without gradient from loss, It is pruned by pytorch.

You can train a model without line https://github.com/XiaLiPKU/EMANet/blob/f7d7b4746104ea62bc5bd3186f6fcb8ea71e3579/network.py#L227

. I don't agree with you. Without grad from loss, pytorch don't update its parameter, but it still works in the forward process.

but i think if the param not update, conv1 make no sense. Will it work appropriate only using the init param of conv1? by the way, have you tried to test performance of model after removing conv1?

Yes, I have. With the 'no_grad' setting, the only function of conv1, is just to map the distribution of input feature maps from R^+ to R.

valencebond commented 4 years ago

@XiaLiPKU thanks for your quickly reply, so performance is a bit worse? can you provide the concrete value?

XiaLiPKU commented 4 years ago

@XiaLiPKU thanks for your quickly reply, so performance is a bit worse? can you provide the concrete value?

I forgot the concrete value here. But in my memory, Deleting the 'with torch.no_grad():' will decrease around 0.5 in mIoU. Moreover, without the conv1 layer, the minimum result of inner product is 0. As there is a 'exp' operation inside the softmax operation, 0 becomes exp(0) = 1, so the corresponding result of z_nk is not close to 0. But with the conv1 layer, the minimum can be -inf, and the correspongindg z_nk is very close to 0. Obviously, the later is what we want. I haven't done the ablation study of conv1. But as analysed above, without conv1, there shall be some decreasing.

valencebond commented 4 years ago

@XiaLiPKU thanks for your detailed explanation~it is a good job!