Question about contrastive distillation loss

Hi,

I have a few questions about the simclr code.

https://github.com/DonkeyShot21/cassle/blob/b5b0929c3b468cd41740a529d58e92ee4e6ace61/cassle/losses/simclr.py#L21 It seems that the predicted features (p) are not in the negatives, which is different from what's suggested in the paper (appendix B). I understand that you switch p and z here (for a symmetric loss?) https://github.com/DonkeyShot21/cassle/blob/b5b0929c3b468cd41740a529d58e92ee4e6ace61/cassle/distillers/contrastive.py#L65-L68 but there is still no comparisons between different samples in p.
In the paper the distillation loss is applied to the two views independently. Based on the code above, does it mean that we should use them jointly to reproduce the result?
https://github.com/DonkeyShot21/cassle/blob/b5b0929c3b468cd41740a529d58e92ee4e6ace61/cassle/losses/simclr.py#L30-L33 The four lines of code here seem to make logit_mask an all-ones matrix. In my understanding we should assign the diagonals to False. Am I missing something?

TIA

DonkeyShot21 / cassle