Closed aporia3517 closed 3 years ago
in akt.py code Line 159 => Line 208 =>Line 281 =>Line 326 it masks the current and future responses. when mask=0 in L159, src_mask in L208 is upper triangular. And lower triangular (masking the current step) in line 324
Thanks for the reply
Hi @arghosh, Thank you for your sharing!
I have a confusion in your masking.
In akt.py code Line 328
In this line, these cat option only switch the first row of masking matrice with zeropadding, while other row stay the same.
Would it be like this?
Hi @arghosh, Thank you for your sharing!
I have a confusion in your masking.
In akt.py code Line 328
In this line, these cat option only switch the first row of masking matrice with zeropadding, while other row stay the same.
Would it be like this?
I think it set the first row to all zeros instead of the first column of scores. Prior to the operations, the first row of scores is initialized with small values(-1e32). Although they remain as small values after softmax, they are not zeros. Therefore, this operation is essential.
Hi @arghosh, First of all, thank you for sharing your great work.
Regarding #3, I'm wondering which part of the source code is masking or slicing the responses (targets).
That is, the part to allow the model to use (1:t-1) responses for predicting t^th response: p(rt | q{1 : t-1}, r_{1: t-1}).