bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
376 stars 49 forks source link

Log tflops and other fixes #33

Closed RaymondLi0 closed 1 year ago

RaymondLi0 commented 1 year ago

This should be merged after #32

RaymondLi0 commented 1 year ago

Thank you for the suggestion @NouamaneTazi ! In https://github.com/bigcode-project/Megatron-LM/pull/33/commits/b18ecf6b332c67e88a20b017a0714172702229b5 I adjusted the formula that's in the comments, could you confirm that this is correct?

NouamaneTazi commented 1 year ago

LGTM! You might wanna check the # TODO: maybe tp_size factor missing here depending on how you implemented MQA

RaymondLi0 commented 1 year ago

I left the todo in the comments. Let's merge this and address this later.