Open Fly-Pluche opened 3 years ago
I don't know why you don't use scaled factor in your self-attention?
I don't know why you don't use scaled factor in your self-attention?