Closed Uljibuh closed 5 months ago
I'd need more information on desired encoder size, and # of images in your dataset. Generally speaking, transformer-based methods like I-JEPA are data intensive so require lots of resources. Furthermore, I-JEPA will need a self-supervised step which also makes it more data/compute hungry.
thank you for your implementation, your code is clean and easy to read.
I am wondering how much dataset and GPU you need to train a small and accurate I-JEPA?