Softmax/Sigmoid for normalize attention?

JaveyWang / Pyramid-Attention-Networks-pytorch

Implementation of Pyramid Attention Networks for Semantic Segmentation.

GNU General Public License v3.0

235 stars 55 forks source link

Softmax/Sigmoid for normalize attention? #2

Closed John1231983 closed 5 years ago

John1231983 commented 5 years ago

In your code, I did not see the normalize attention. It often uses softmax or sigmoid. Please check it

https://github.com/JaveyWang/Pyramid-Attention-Networks-pytorch/blob/f719365c1780f062058dd0c94550c6c4766cd937/networks.py#L99

JaveyWang commented 5 years ago

Thanks for your careful watch. The paper seems not talk about the traditional attention technique we used before. Due to the fact that the high level feature maps can preserve class feature information while the low level feature maps can keep the high resolution information, the idea here is to use the class information to guide the low level feature to do semantic segmantation task. Maybe this idea is the so called 'attention' .

John1231983 commented 5 years ago

Actually, attention means probability. You should normalize it in the range of 0 to 1. Let try it if it has better performance

Chenfeng1271 commented 5 years ago

Actually, attention means probability. You should normalize it in the range of 0 to 1. Let try it if it has better performance

Sir, can you tell me the result you trained, I achieve the same requirement, pytorch version, 1080Ti or else, but there are not desired results for 73.38% for 24 batch_size and 256*256 resolution, the batch 4 is worse. Even I tried a lot of setting up, the best is just 75.6%

John1231983 commented 5 years ago

@Chenfeng1271 : I did not run it. I just look at the code and saw the problem. Can you try to add softmax or sigmoid (recommended) to normalize attention? I guess it will be better. Let 's me know the result if you done