Training Novel Concepts

Is there a way to train novel concepts into your blip model, like the way that textual inversions work for stable diffusion image generation? If so is there a training script provided or would one need to be created?

Also, there have been some recent innovations in computer vision software that might prove useful but I don't know how much it would require altering your model to use some of these. Kosmos2 by Microsoft has proved very promising in creating image captions for instance. Much better than my previous blip model I had used. Maybe a more powerful language model would overcome some of BLIPS shortcoming in identifying novel concepts. Further, there are new ways for these types of computer vision softwares to go about scanning an image to ensure, such as SAHI (Slicing Aided Hyper Inference) that allow for the computer to find smaller objects in larger images. I provided both of the links below for you to look at.

https://huggingface.co/docs/transformers/main/en/model_doc/kosmos-2

https://github.com/obss/sahi

THUDM / ImageReward

Training Novel Concepts #71