question about equation 7 in the paper

CoinCheung commented 3 years ago

Hi,

thanks for the work. I saw the equation 7 in the paper that the probablity for the cross entropy is not pure softmax activations. In the numerator, the input of exp is wx+b, while in the denormerator, the input of the exps are wx+b+(\lam vt \sigma v)/2. Why do you use nn.CrossEntropyLoss directly in the paper ? Should it be based on softmax ?

Also，As for as I know the covariance matrix of gaussian distribution is a square matrix, why is it defined with the shape of class_num x feature_num ?

blackfeather-wang commented 3 years ago

Thank you for your attention!

(1) In fact, we have v = 0 for the numerator, apparently. ;)

(2) I guess you see this in our code for imagenet. As we state in the paper, we approximate the covariance matrices by their diagonals on imagenet (see sec. 6.1) to save GPU memory (reduce the covariance tensor from 1000x2048x2048 to 1000x2048). You may check our code on cifar for the Vanilla ISDA.

CoinCheung commented 3 years ago

I understand, Thans a lot !!!

blackfeather-wang / ISDA-for-Deep-Networks

question about equation 7 in the paper #7