arghosh / AKT

MIT License
93 stars 31 forks source link

Masking or slicing allowing the model to use previous interactions #7

Closed aporia3517 closed 3 years ago

aporia3517 commented 3 years ago

Hi @arghosh, First of all, thank you for sharing your great work.

Regarding #3, I'm wondering which part of the source code is masking or slicing the responses (targets).

That is, the part to allow the model to use (1:t-1) responses for predicting t^th response: p(rt | q{1 : t-1}, r_{1: t-1}).

arghosh commented 3 years ago

in akt.py code Line 159 => Line 208 =>Line 281 =>Line 326 it masks the current and future responses. when mask=0 in L159, src_mask in L208 is upper triangular. And lower triangular (masking the current step) in line 324

aporia3517 commented 3 years ago

Thanks for the reply

ghzha0 commented 2 years ago

Hi @arghosh, Thank you for your sharing!

I have a confusion in your masking.

In akt.py code Line 328

image

In this line, these cat option only switch the first row of masking matrice with zeropadding, while other row stay the same.

Would it be like this?

image
lif323 commented 10 months ago

Hi @arghosh, Thank you for your sharing!

I have a confusion in your masking.

In akt.py code Line 328

image

In this line, these cat option only switch the first row of masking matrice with zeropadding, while other row stay the same.

Would it be like this? image

I think it set the first row to all zeros instead of the first column of scores. Prior to the operations, the first row of scores is initialized with small values(-1e32). Although they remain as small values after softmax, they are not zeros. Therefore, this operation is essential.