feat: group-query-attention implementation

Modalities / modalities

A framework for training multimodal foundation models.

MIT License

57 stars 5 forks source link

feat: group-query-attention implementation #72

Closed luzian-hahn closed 6 months ago

luzian-hahn commented 6 months ago

Re-opened version of #41.

Potential solution for handling the combination of GQA and FlashAttention: https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html

luzian-hahn commented 6 months ago

Unfortunately re-opening this PR does not resolve the issue with the 100 files. Besides that, the conflicts popping up cannot get resolved locally, so that a respective commit changes the state of this PR. Please ignore this fact therefore. This will get fixed during the merge itself later.

(Resolving the conflicts locally resulted into an empty commit, which did not change anything on the branch and results into the conflicts to re-occur. I am not quite sure why this is happening)

flxst commented 6 months ago

Closed and replaced by #74.