Jamie-Stirling / RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
MIT License
1.14k stars 99 forks source link

what about cross-attention #17

Open aki819 opened 11 months ago

aki819 commented 11 months ago

Can this model achieve cross-attention similar to how transformer handles different modal embedding matrices?