Question about dinov2 vs ijepa

facebookresearch / ijepa

Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive architecture."

Other

2.75k stars 335 forks source link

In Table 4, page 12 of the Dinov2 paper, the top 1% accuracy number for Dino(v1) using a linear probe is 79.2. On the other hand, in Table 2 of the I-JEPA paper the top 1% accuracy for Dino(v1) is 70. This discrepancy could possibly be due to I-IEPA reporting results at 300 epochs and Dinov2 reporting results at 500 epochs.

My question is the following. At 300 epochs I-JEPA produces a top 1% accuracy of 77.3 which is 7.3% greater than Dino(v1). If you run I-JEPA to 500 epochs, would it still be greater than Dino(v1) by 7.3%?

Another question is: What happens if you train I-JEPA with the LVD-142M dataset for 500 epochs? Would it be superior to Dinov2?

facebookresearch / ijepa

Question about dinov2 vs ijepa #66