Closed RaymondLi0 closed 1 year ago
Thank you for the suggestion @NouamaneTazi ! In https://github.com/bigcode-project/Megatron-LM/pull/33/commits/b18ecf6b332c67e88a20b017a0714172702229b5 I adjusted the formula that's in the comments, could you confirm that this is correct?
LGTM! You might wanna check the # TODO: maybe tp_size factor missing here
depending on how you implemented MQA
I left the todo in the comments. Let's merge this and address this later.
This should be merged after #32