How to implement the causal transformer?

delzac / cntkx

Deep learning library that builds on and extends Microsoft CNTK

23 stars 5 forks source link

How to implement the causal transformer? #13

Open newoneincntk opened 4 years ago

newoneincntk commented 4 years ago

Hello, sorry to bother you again. I see you added the Linear Attention in the cntkx, and I want to know how to implement the causal transformer mentioned in the related paper. Thank you very much!

delzac commented 4 years ago

Ahhh yes, i didn't add in causal masking into LinearAttention as it wasn't immediately clear to me how it should be implemented. So, i'm still thinking about it! Sorry about that :(