Closed Kyeongpil closed 4 years ago
As described in gpt2_345m_hparams, I think the number of self-attention head (line 370) should be 16, not 24.
I agree!
As described in gpt2_345m_hparams, I think the number of self-attention head (line 370) should be 16, not 24.