YuanGongND / ssast

Code for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
353 stars 58 forks source link

about nce loss cal #13

Closed liyunlongaaa closed 1 year ago

liyunlongaaa commented 1 year ago

Hi, I am wondering if dim should be 0 at https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/models/ast_models.py#L98-L99

https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/models/ast_models.py#L350-L352

total[j, i] is the inner product of xj and ci^T, so softmax the -1 dim on total is summing w.r.t c_i, not x_i, which is not consistent with equation 1 of the paper.

YuanGongND commented 1 year ago

Hi there,

I guess you are correct that dim should be 0 to match with the equ(1) of the paper.

https://github.com/YuanGongND/ssast/blob/a1a3eecb94731e226308a6812f2fbf268d789caf/src/models/ast_models.py#L98-L99

My guess is that impacts correct (which is not important) but not much for the nce.

We follow an old implementation of CPC and made the same mistake as stated in https://github.com/jefflai108/Contrastive-Predictive-Coding-PyTorch/issues/19.

Sorry for the confusion. I added a note above line 350 but not planning to change the code for reproducibility consideration.

-Yuan

liyunlongaaa commented 1 year ago

Thank you for your reply, I should learn from you to do many good jobs and share with others (oh, I want to graduate O(∩_∩) )!

YuanGongND commented 1 year ago

Thanks for the good catch - and good luck with your research!!