Jamie-Stirling / RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
MIT License
1.16k stars 100 forks source link

The complex theta should cancel out #28

Open albertbuchard opened 1 year ago

albertbuchard commented 1 year ago

Maybe I am missing something but do we need the Theta? Since its magnitude is 1, multiplying with its conjugate should cancel out in the parallel version.