THUDM / ImageReward

[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
Apache License 2.0
1.12k stars 62 forks source link

Training Novel Concepts #71

Open adamgeddon1686 opened 7 months ago

adamgeddon1686 commented 7 months ago

Is there a way to train novel concepts into your blip model, like the way that textual inversions work for stable diffusion image generation? If so is there a training script provided or would one need to be created?

Also, there have been some recent innovations in computer vision software that might prove useful but I don't know how much it would require altering your model to use some of these. Kosmos2 by Microsoft has proved very promising in creating image captions for instance. Much better than my previous blip model I had used. Maybe a more powerful language model would overcome some of BLIPS shortcoming in identifying novel concepts. Further, there are new ways for these types of computer vision softwares to go about scanning an image to ensure, such as SAHI (Slicing Aided Hyper Inference) that allow for the computer to find smaller objects in larger images. I provided both of the links below for you to look at.

https://huggingface.co/docs/transformers/main/en/model_doc/kosmos-2

https://github.com/obss/sahi

xujz18 commented 7 months ago

Thank you so much for discussing and sharing! Regarding the first question, training new concepts into the model, we think that new scripts are needed for further training. Regarding your proposed new research results such as MLLM, we think it is a very worthwhile practice to try.