MishaLaskin / curl

CURL: Contrastive Unsupervised Representation Learning for Sample-Efficient Reinforcement Learning
MIT License
561 stars 88 forks source link

Generating the labels with torch.arange #14

Closed ckusumadewa-lab closed 3 years ago

ckusumadewa-lab commented 3 years ago

First of all, thank you so much for kindly sharing your great research and also the code.

However, I have one question regarding the labels generation from the logits using the following code (curl_sac.py line 424):

labels = torch.arange(logits.shape[0]).long().to(self.device)

What if, for example, we get several same observations in the batch sampled from the replay buffer? Isn't the code will set same features as different classes since we use torch.arange?

Please correct me if I am wrong. Thank you so much.

MishaLaskin commented 3 years ago

This happens sometimes, but the likelihood is low and InfoNCE loss is robust to small perturbations since there are many other negatives. More generally, this is an unsupervised learning loss, so we don't have access to the image labels and therefore don't know if two observations are the same (unless they coincide perfectly in pixel space, which is unlikely).

On Tue, Aug 25, 2020 at 5:55 AM ckusumadewa-lab notifications@github.com wrote:

First of all, thank you so much for kindly sharing your great research and also the code.

However, I have one question regarding the labels generation from the logits using the following code (curl_sac.py line 424):

labels = torch.arange(logits.shape[0]).long().to(self.device)

What if, for example, we get several same observations in the batch sampled from the replay buffer? Isn't the code will set same features as different classes since we use torch.arange?

Please correct me if I am wrong. Thank you so much.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MishaLaskin/curl/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHWQWIFDQPKEQWRVXMCPPTSCOC7ZANCNFSM4QKOTYFQ .

aravindsrinivas commented 3 years ago

Closing issue.