About Selective Attention mechanism.

CRISZJ commented 2 years ago

Hello, thanks for opening such an excellent job. When I read your code, I found that the code does not seem to use Selective Attention Mechanism. Instead, it uses the Cross-Attention. Do I understand mistakes?

XLechter commented 2 years ago

Hi, @CRISZJ Thanks for your interets for our work. The released pre-trained model use the original self-attention layer which means the selective ratio=1.0 to get the best performance. You can replace the attention layer in the model.py with the selective attention layer, which we give the implementation in L288 in model.py(https://github.com/XLechter/SDT/blob/b8fe7ed7e4eb0cb54271baee510f1d9d833dbfe0/models/model.py#L288), but it will reduce the performance according to the selective ratio you choose.

CRISZJ commented 2 years ago

Hi, @CRISZJ Thanks for your interets for our work. The released pre-trained model use the original self-attention layer which means the selective ratio=1.0 to get the best performance. You can replace the attention layer in the model.py with the selective attention layer, which we give the implementation in L288 in model.py(

https://github.com/XLechter/SDT/blob/b8fe7ed7e4eb0cb54271baee510f1d9d833dbfe0/models/model.py#L288

), but it will reduce the performance according to the selective ratio you choose.

Okay, thanks for your reply. I have another question after reading your paper. In Section4.2. 'Adding a position encoding layer can significantly boost the performance for finding long-range relations'. Could you please tell me the performance gap between PE and no PE? :-)

XLechter commented 2 years ago

@CRISZJ It seems I didn't give an ablation study about the PE in the paper. It's about 0.2 CD if i remember correctly which may not that 'significantly' 🤣

CRISZJ commented 2 years ago

@CRISZJ It seems I didn't give an ablation study about the PE in the paper. It's about 0.2 CD if i remember correctly which may not that 'significantly' 🤣

OK, thanks for your reply again. Got it.

XLechter / SDT

About Selective Attention mechanism. #1