cloneofsimo / lora

Using Low-rank adaptation to quickly fine-tune diffusion models.
https://arxiv.org/abs/2106.09685
Apache License 2.0
6.94k stars 479 forks source link

Support prefix text for blip captioning #166

Closed levi closed 1 year ago

levi commented 1 year ago

Provides a starting string for the text encoder. Works really well when you know the images are all of a particular concept or style.

Without:

brown eyed man smiling at camera in white t - shirt with a smile on his face man in green shirt with blue eyes and a smile on his face taken from the top, a man with a big face someone is sitting on the bed and holding a bowl in front of a tv

With "a photo of":

a photo of a man in a white shirt sitting on a couch a photo of a man smiling while holding a toothbrush a photo of a man in a green shirt is holding a doughnut a photo of a man is sitting on the ground

Can even be a lot more accurate when the subject is explicitly defined. ie. "a photo of a man"

cloneofsimo commented 1 year ago

Thanks!