cschenxiang / DRSformer

Learning A Sparse Transformer Network for Effective Image Deraining (CVPR 2023)
265 stars 14 forks source link

Question about Top-K #2

Closed wangjg33 closed 1 year ago

wangjg33 commented 1 year ago

Congratulations, this is an excellent job!

I have a question when reading the code and paper. The paper mentioned that the k parameter of top-k can be dynamically learned, but when I read the code, it seems to be a fixed K value. The specific code should be on lines 140 to 154 of the DRSformer_arch. py file . Here.

Could you please help me understand how K mentioned in the paper is dynamically learned?

cschenxiang commented 1 year ago

Congratulations, this is an excellent job!

I have a question when reading the code and paper. The paper mentioned that the k parameter of top-k can be dynamically learned, but when I read the code, it seems to be a fixed K value. The specific code should be on lines 140 to 154 of the DRSformer_arch. py file . Here.

Could you please help me understand how K mentioned in the paper is dynamically learned?

Thanks for your attention! Here, K is formally obtained by weighted average of some proper fractions, rather than a single value. In other words, the sparsity level is dynamically learnable, rather than K.

wangjg33 commented 1 year ago

Thank you for your reply. I think I understand what you mean. But I also noticed another thing. In the code here. The attention within the range of (4/5,1) will be increased four times due to the addition of four times, resulting in the final "out" value being increased four times. (3/4, 4/5) and (2/3, 3/4) have been expanded by 3 and 2 times, respectively. Is it like this?. Additionally, dividing the attention matrix into four calculations may increase the computational complexity by four times?