Closed John1231983 closed 5 years ago
Thanks for your careful watch. The paper seems not talk about the traditional attention technique we used before. Due to the fact that the high level feature maps can preserve class feature information while the low level feature maps can keep the high resolution information, the idea here is to use the class information to guide the low level feature to do semantic segmantation task. Maybe this idea is the so called 'attention' .
Actually, attention means probability. You should normalize it in the range of 0 to 1. Let try it if it has better performance
Actually, attention means probability. You should normalize it in the range of 0 to 1. Let try it if it has better performance
Sir, can you tell me the result you trained, I achieve the same requirement, pytorch version, 1080Ti or else, but there are not desired results for 73.38% for 24 batch_size and 256*256 resolution, the batch 4 is worse. Even I tried a lot of setting up, the best is just 75.6%
@Chenfeng1271 : I did not run it. I just look at the code and saw the problem. Can you try to add softmax or sigmoid (recommended) to normalize attention? I guess it will be better. Let 's me know the result if you done
In your code, I did not see the normalize attention. It often uses softmax or sigmoid. Please check it
https://github.com/JaveyWang/Pyramid-Attention-Networks-pytorch/blob/f719365c1780f062058dd0c94550c6c4766cd937/networks.py#L99