berlino / gated_linear_attention

MIT License
95 stars 2 forks source link

Question about masking #7

Closed Cranial-XIX closed 8 months ago

Cranial-XIX commented 8 months ago

Hi, I am very new to the triton code. I am curious about how is the causal mask implemented. Is it implicitly assumed in the triton code because you use the cumulative sum form? In particular, I wonder how this line and the line below implement the causal masking?

sustcsonglin commented 8 months ago

for interchunk ops, since there is no overlap between two consecutive chunks, so there is no causal mask.

for intrachunk ops, i have one in https://github.com/berlino/gated_linear_attention/blob/main/kernels/intra_chunk_contribution/fn_only_gk.py#L205C1-L206C1

Cranial-XIX commented 8 months ago

Thanks a lot for the extremely prompt reply :)