Closed BlackSamorez closed 1 year ago
GPT-2 combines attention weights as QQQQQKKKKKVVVVV. The function to properly split those weights with respect to individual attention heads was broken. This PR fixes it.
GPT-2
It also resolves #99
GPT-2
combines attention weights as QQQQQKKKKKVVVVV. The function to properly split those weights with respect to individual attention heads was broken. This PR fixes it.