Closed wang2yn84 closed 3 months ago
LGTM! Thank you for the change. Seems linter needs to be fixed
Smart change! So the q k^t = [hkv, rep seq_len, seq_len] and q k v = [hkv, rep * seq_len, d], you reshape the output to: [h, seq_len, d] in the end.
Correct. The reshape doesn't affect the result.
LGTM! Thank you for the change. Seems linter needs to be fixed Yup, fixed!
Repeat kv in the original llama model will copy the data in some cases. Replace it with reshaping the number of heads dimension in the query to the number of tokens dimension (-2).