Open Alookima21 opened 3 weeks ago
Yeah! That's the main goal of this project, although my task was object detection. You can take a look of the implementation in https://github.com/KostadinovShalon/MVViT/tree/1031c7730bf5b51923103126f627027a7a6592fa/models/transformers for this goal.
thank you. I have a follow up question. Is there any training code which allows me to train the encoder in isolation without adding a downstream task to it?
I am working on a project where I want to input multiple views and get one encoding which I will then use to reconstruct a 3D structure from the images. Could I use this model just for getting a single encoding given multiple views?