Multi-Head Attention - Githubissues

Rubics-Xuan / TransBTS

This repo provides the official code for : 1) TransBTS: Multimodal Brain Tumor Segmentation Using Transformer (https://arxiv.org/abs/2103.04430) , accepted by MICCAI2021. 2) TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images(https://arxiv.org/abs/2201.12785).

Apache License 2.0

388 stars 81 forks source link

Multi-Head Attention #10

Closed deepzlk closed 3 years ago

deepzlk commented 3 years ago

It seems that Multi-Head Attention did not implement multi heads=8?

Rubics-Xuan commented 3 years ago

attn = (q @ k.transpose(-2, -1)) * self.scale attn = attn.softmax(dim=-1) attn = self.attn_drop(attn) x = (attn @ v).transpose(1, 2).reshape(B, N, C)

Although there is no concatenation in the code which implements the multi-head, the codes above actually achieves the multi-head self-attention.