Closed shnuhw closed 8 months ago
Thanks for this GOOD JOB. The final outoput dim is headden dim, not vocab size. Should add lm head in the code before train language model?
I do not know that why the input X need hidden size dim? Can anyone explain? thx !
Thanks for this GOOD JOB. The final outoput dim is headden dim, not vocab size. Should add lm head in the code before train language model?