Hello, sorry to bother you again. I see you added the Linear Attention in the cntkx, and I want to know how to implement the causal transformer mentioned in the related paper. Thank you very much!
Ahhh yes, i didn't add in causal masking into LinearAttention as it wasn't immediately clear to me how it should be implemented. So, i'm still thinking about it! Sorry about that :(
Hello, sorry to bother you again. I see you added the Linear Attention in the cntkx, and I want to know how to implement the causal transformer mentioned in the related paper. Thank you very much!