Why there is no need mask in global self attention?

SoulProficiency / speechseparation-Sandglasset

10 stars 0 forks source link

Why there is no need mask in global self attention? #2

Closed yjiangling closed 2 years ago

yjiangling commented 3 years ago

Hi,

      I'm wondering that when conduct Global self attention in th Sandglasset block, the attn_mask is set to be None. But if the length of each sample in th batch is not same, the shorter sample will use padded content to conduct the SA, it seems that something goes wrong? Hope get your reply, thanks a lot.

SoulProficiency commented 3 years ago

got it, I will reply as soon as possible.

SoulProficiency commented 2 years ago

sorry for my late,I have lots of work recently. that's a good question.and i can not answer this question because i‘m not the original author(i hope the auther provides codes but they didn't)，i just follow the paper modeling and get to konw the modeing method. I set mask as None has two reason: 1.simplifying my work(in paper,author did not share more about the mask details so it is hard for me to slove this problem.) 2.spare more changeable choice for reader and i hope if you could procide us a suitable answer,please leave your comment!Thanks

yjiangling commented 2 years ago

sorry for my late,I have lots of work recently. that's a good question.and i can not answer this question because i‘m not the original author(i hope the auther provides codes but they didn't)，i just follow the paper modeling and get to konw the modeing method. I set mask as None has two reason: 1.simplifying my work(in paper,author did not share more about the mask details so it is hard for me to slove this problem.) 2.spare more changeable choice for reader and i hope if you could procide us a suitable answer,please leave your comment!Thanks

Ok, thanks a lot for the detail replay, In my work, I add the mask to it.