gaasher / I-JEPA

Implementation of I-JEPA from "Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture"
MIT License
254 stars 26 forks source link

V-JEPA, Expansion, License #11

Open Sharpz7 opened 7 months ago

Sharpz7 commented 7 months ago

Hey, really like what you have done, but a few questions:

My final concern is the license. Something like MIT is definitely ideal (See https://opensource.org/licenses), but I am not sure if you are allowed to put it under this license.

Reading the terms of the original I-JEPA license, (https://github.com/facebookresearch/ijepa?tab=License-1-ov-file#readme), I think you would need to put this code under the same license if you have shared the code. I am not sure about the case where you have just used the paper to build your own codebase.

My collaboration would be dependent on the license being MIT or equivalent, as the use I can make of anything else is limited (https://opensource.org/osd)

Thanks again for your efforts and I hope to hear from you soon ;))

gaasher commented 7 months ago

@Sharpz7, I would love to collaborate on transferring this to V-JEPA! I currently have a MIT license attached, since all the code is mine and I just transcribed what I read in the paper. In fact I released this code before any official implementation was released. What do you envision the final result for easy inference looking like? Also, I'm pretty busy right now so I don't have much time to work on this repo further. With that said - what can I do to help you work fastest/best?

Sharpz7 commented 7 months ago

Hey! Thanks for the quick response, that is great!

I as well do not have much bandwidth right now, but I will track this for when I do because I think it would be a great contribution to the community.

Does your implementation include the conditional latent z variable from the JEPA architecture? I have struggled to find any references on what that actually is outside of the first H-JEPA Paper.

In terms of "easy inference" - I would love to build out the codebase to support using JEPA in different ways - from CLI, from an import etc, as well as having sensible defaults, easy data processing, and connections to things like tensorboard, W&B, MlFlow. These things, although small, would make it much easier to make cool things with JEPA as a software engineer with limited ML experience.

As well as this, documentation on how to make decoders, and again an easy way to "build" a decoder from this codebase would also be cool.

If you agree, maybe I could open a ticket about scoping all this work, into smaller "bite-sized" chunks?

gaasher commented 7 months ago

Sorry for the late reply - just started classes again for this term.

My implementation does not use the z variable. My understanding is this is just a stand-in to explain the fact that JEPA models don't learn a known probability distribution like generative models. Also your plan sounds good. We could even just work to integrate this with huggingface. Feel free to divvy this up into bite size tickets as you mentioned.