Open headscott opened 5 days ago
In your paper I saw, that you used Softmax, but I can't find it in your code. I only see ReLu. Did I unsterstand that wrong, or don't you even use Softmax?
Oh wait, an Update: I found Softmax. But you only use it for loss functions, right? Not as activation function?
In your paper I saw, that you used Softmax, but I can't find it in your code. I only see ReLu. Did I unsterstand that wrong, or don't you even use Softmax?