Discriminating between few objects is a standard language emergence setup: Sender gets an object, sends a message, and then Receiver has to choose it among a set of distractors, based on a message.
At the same time, a somewhat similar idea is used in self-supervised learning: an Encoder net learns a representation of an image. Then another, smaller net (projection head), projects that learned representation to some space and in this space, a discriminative loss is applied. SimCLR is close to this setup (modulo applying augmentations).
In this issue I propose to implement SimCLR as an EGG game on images, say as EGG/zoo/simclr.
Describe the solution you'd like to have implemented
We probably have no need for data augmentation at the beginning.
v0: SimCLR on MNIST. At the first step, Sender would get an image from MNIST, send a message and Receiver would need to chose it, using the rest of the batch as negatives. Here we'd have three options for the channel optimisation: using continuous messages (expect that agents would successfully overfit), Gumbel-Softmax, and REINFORCE.
MNIST autoencoder can be used as an starting code here
The only (moderately) tricky part would be implementing the contrastive loss.
We can use prediction head's accuracy as an evaluation metric here.
Probably, it will work better if Sender has some form of conv. architecture (LeNet3)?
We'd expect that (a) agents can overfit perfectly with continuous channel (in the absence of augmentations), (b) achieve some non-chance accuracy with Gumbel-Softmax and REINFORCE.
v1: ImageNet instead of MNIST. Linear probes as an evaluation. ResNet as an underlying architecture.
v2: Augmentations. Approaching the actual SimCLR in performance with continuous channel.
Is your proposal related to a problem?
Discriminating between few objects is a standard language emergence setup: Sender gets an object, sends a message, and then Receiver has to choose it among a set of distractors, based on a message.
At the same time, a somewhat similar idea is used in self-supervised learning: an Encoder net learns a representation of an image. Then another, smaller net (projection head), projects that learned representation to some space and in this space, a discriminative loss is applied. SimCLR is close to this setup (modulo applying augmentations).
In this issue I propose to implement SimCLR as an EGG game on images, say as
EGG/zoo/simclr
.Describe the solution you'd like to have implemented
We probably have no need for data augmentation at the beginning.
The only (moderately) tricky part would be implementing the contrastive loss.
We can use prediction head's accuracy as an evaluation metric here.
Probably, it will work better if Sender has some form of conv. architecture (LeNet3)?
We'd expect that (a) agents can overfit perfectly with continuous channel (in the absence of augmentations), (b) achieve some non-chance accuracy with Gumbel-Softmax and REINFORCE.
Useful resources
[1] SimCLR https://arxiv.org/pdf/2002.05709v3.pdf [2] Some implementation of SimCLR on github https://github.com/Spijkervet/SimCLR https://github.com/lucidrains/contrastive-learner for inspiration; a blogpost https://sthalles.github.io/simple-self-supervised-learning/