Question regarding the scores in softmax_kl_loss function.

YunYunY commented 1 year ago

Dear authors,

As I understand the variable 'scores' is already a normalized probabilities, why in the softmax_kl_loss function, you used log_softmax not log directly? As I understand, the KL divergence is calculated between the log probabilities of the predicted distribution and the true distribution. In your implementation, the scores would be log(log(softmax(p))). Is there a specific reason you implement in this way?

The other question I have is related to Eq. 4. Where did you implement this in the code? I was unable to find it in TSD code. Please let me know if I misunderstood. Thank you.

SakurajimaMaiii commented 1 year ago

Thanks for you interest. (1) I think what your said is right. It is an implementaton error. I will try to fix it as soon as possible. (2) We found that consisency filter (Eq.(4) in paper) is useful for many datasets (e.g. PACS/DomainNet/CIFAR-10-C) but not useful for some datasets ( ImageNet-C ) after more ablation studies. So we remove it. It is easy to implement Eq.(4) if you want to check it. I will update this as soon as possible.

YunYunY commented 1 year ago

Hello, Thanks for the quick response. Regarding (1) I tried implementing the correct KL by removing extra log but I can't get comparable results. I only tried on PACS test on the last domain so far. I didn't tune other parameters. Thank you.

SakurajimaMaiii / TSD

Question regarding the scores in softmax_kl_loss function. #9