NO LM HEAD - Githubissues

Jamie-Stirling / RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"

MIT License

1.14k stars 99 forks source link

Closed shnuhw closed 8 months ago

shnuhw commented 8 months ago

Thanks for this GOOD JOB. The final outoput dim is headden dim, not vocab size. Should add lm head in the code before train language model?

shnuhw commented 8 months ago

I do not know that why the input X need hidden size dim? Can anyone explain? thx !