YuanGongND / ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
365 stars 61 forks source link

about nce loss cal #13

Closed liyunlongaaa closed 2 years ago

liyunlongaaa commented 2 years ago

Hi, I am wondering if dim should be 0 at https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/models/ast_models.py#L98-L99

https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/models/ast_models.py#L350-L352

total[j, i] is the inner product of xj and ci^T, so softmax the -1 dim on total is summing w.r.t c_i, not x_i, which is not consistent with equation 1 of the paper.

YuanGongND commented 2 years ago

Hi there,

I guess you are correct that dim should be 0 to match with the equ(1) of the paper.

https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/models/ast_models.py#L98-L99

My guess is that impacts correct (which is not important) but not much for the nce.

We follow an old implementation of CPC and made the same mistake as stated in https://github.com/jefflai108/Contrastive-Predictive-Coding-PyTorch/issues/19.

Sorry for the confusion. I added a note above line 350 but not planning to change the code for reproducibility consideration.

-Yuan

liyunlongaaa commented 2 years ago

Thank you for your reply, I should learn from you to do many good jobs and share with others (oh, I want to graduate O(∩_∩) )!

YuanGongND commented 2 years ago

Thanks for the good catch - and good luck with your research!!