Using same context and targets for all the images in a batch

gaasher / I-JEPA

Implementation of I-JEPA from "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture"

MIT License

249 stars 24 forks source link

Hi! I see that you use the same context and targets indices for all the images in a batch. I guess doing that allows you to have the same size tensor for each image when passing through the predictor (so it can be stacked). I saw that here (a different implementation) they used torch.vmap to use different context and target indices for each image, but it is not clear how I could use that in your code. They claim that the loss increased exponentially if you use the same for each image in a batch. (In some of my experiments, I also have seen this, but the network performed well on linear evaluation at the end). Do you have any ideas on how to make the context and targets indices different for each image in a batch for your code?

gaasher / I-JEPA

Using same context and targets for all the images in a batch #6