OpenNLPLab / cosFormer

[ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention
Apache License 2.0
176 stars 25 forks source link

why input is [s b dim] but not [b s dim]? #11

Closed Zyriix closed 1 year ago

Zyriix commented 1 year ago

this is more general(from my perspective) q,k,v shape : (b,s,d)

q = q.contiguous() q = rearrange(q, 'b n (h d) -> (b h) n d', h = self.num_heads)

(N * h, S, d)

k = k.contiguous() k = rearrange(k, 'b n (h d) -> (b h) n d', h = self.num_heads)

(N * h, S, d)

v = v.contiguous() v = rearrange(v, 'b n (h d) -> (b h) n d', h = self.num_heads)

Doraemonzzz commented 1 year ago

This is to align with the input of fairseq.