RElbers / info-nce-pytorch

PyTorch implementation of the InfoNCE loss for self-supervised learning.
MIT License
470 stars 39 forks source link

Questions about codes for negative key is None #13

Closed Yuntian9708 closed 1 year ago

Yuntian9708 commented 1 year ago

I have a question about codes for negative key =None:

If negative key is None, then Negative keys are implicitly off-diagonal positive keys.

Cosine between all combinations

logits = query @ transpose(positive_key)

# Positive keys are the entries on the diagonal
labels = torch.arange(len(query), device=query.device)

why labels are torch.arrange(len(query)? for example: 0,1,2,3,4,5...I think labels for query-postive key should be torch.ones(len(query))

Yuntian9708 commented 1 year ago

I also think when computing positive logits, the length of the positive key and query can be different, according to the infonce loss pseudocode from MoCo, we can rewrite code"positive_logit = torch.sum(query * positive_key, dim=1, keepdim=True)" as "positive_logit = torch.mm(query, positive_key.transpose(0, 1))". for example, in my task, query size is (64, 320) and positive key size is (24,320), negative key size is (100, 320), I also run this loss successfully. Hope I didn't make mistakes.

RElbers commented 1 year ago

Hi. If there is no negative_key, then the logits are the matrix multiplication query @ transpose(positive_key). We want to increase the values on the diagonal (the positive logits) and decrease the other values (the negative logits). You can treat it as a classification problem where the label for the 1st sample is 1, for the 2nd is 2, for the 3rd is 3 etc. That is why the labels is arange(len(query)) if there are no negative keys.

The code checks the number of samples of the query and positives and raises an error if len(query) != len(positive_key). I'm just quickly looking at the code for MoCo and it looks like it also requires the query and positive_keys have the same shape (both NxC: https://github.com/facebookresearch/moco/blob/5a429c00bb6d4efdf511bf31b6f01e064bf929ab/moco/builder.py#L157C1-L157C63). For the negative_keys the size does not matter.

Yuntian9708 commented 1 year ago

OK, thank you! Now I understand the 1st question. For the size of the query, positive key and negative key, in my task, they have the same embedding size but different numbers in one batch. For example, batchsize=128, the number of queries is 28, the number of positive keys is 40, and the number of negative keys is 60. I can also get positive logits by implementing query @ transpose(positive_key). Hope I didn't mask mistakes. My task is not exactly the same as MoCo's. The query and positive key are not in one-to-one correspondence, but two groups to compared

Yuntian9708 commented 1 year ago

I guess I know the answer to my previous question, after carefully reading the pseudocode and InfoNCE formula in Moco, I think for the contrastive learning for two groups (query set and positives set with different numbers of samples), we can also treat it as a multiclass classification task. For a given m queries, n positives, and p negatives, there are (n+p) classes, and we classify m queries into n classes.

whu-lyh commented 1 year ago

I am confused about the negatives when there are no explicit negatives fed into Info-NCE. From my perspective, let me give a simple example, given the two inputs in a batch (batch_size equals to 2, contains only query A, B and positive $A^+$, $B^+$, and each input is C dimension), the negatives of A is tackled as the $B^+$? My question is: once the A, B are also similar samples, the Info-NCE still works? Am I clear? 😄

whu-lyh commented 1 year ago

@RElbers @Yuntian9708 Thanks~

Yuntian9708 commented 1 year ago

From my understanding, the object of InfoNCE is to classify the query and corresponding positive key into one class. In the example you mentioned above, the A and A+ should be classified into one class, B+ and B should be classified into another one class. A+ should be augmented from A. Since contrastive learning is one type of self-supervised learning, it seems we can not just define A and B are similar samples based on label information. I suggest you to read the pseudocode of InfoNCE in the MoCo paper. Did I understand your question clearly?

Yarmohamadshr commented 1 year ago

Hi, I have a question. This library still works? I install it in Google colab, but then when I want to import it, gives me the error that the modular does not exist. Can you help me with that?

ModuleNotFoundError: No module named 'info_nce_pytorch'

RElbers commented 1 year ago

Hi. I'm not sure why it doesn't work for you, but it still works for me. Do you use !pip install info_nce_pytorch to install it in colab? After you installed it, the import is just import info_nce without the _pytorch part.

Yarmohamadshr commented 1 year ago

Good morning. Thank you for the help, yes now it works, I restart the runtime and did it again, I have another question. I have two columns of text, the first column is a request, and the second is the answer to that request. To train my model on this loss function, how should I send the embeddings to this loss? This query you mentioned should just be one embedding for one column of text? I want to train the model, so for the future new description predict the relevant answer.