bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
371 stars 48 forks source link

convert reshape to view #73

Closed mayank31398 closed 1 year ago

mayank31398 commented 1 year ago

Is this needed anymore? I tried running for 100 steps and I get the same lm loss with both.