Jamie-Stirling / RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
MIT License
1.14k stars 99 forks source link

NO LM HEAD #32

Closed shnuhw closed 8 months ago

shnuhw commented 8 months ago

Thanks for this GOOD JOB. The final outoput dim is headden dim, not vocab size. Should add lm head in the code before train language model?

shnuhw commented 8 months ago

I do not know that why the input X need hidden size dim? Can anyone explain? thx !