Closed typercast closed 2 years ago
It should be so long as inference functions like any a normal nn.Module
. Give it a try and alter the final embedding layer to be the same as your text encoder and tell me how it goes!
I'll reopen this if you run into any problems. :)
Is it possible to use a pre-trained image model from Hugging Face when trying to fine-tune? The latest models are usually there, so it would be pretty cool if it was compatible.