Support prefix text for blip captioning

Provides a starting string for the text encoder. Works really well when you know the images are all of a particular concept or style.

Without:

brown eyed man smiling at camera in white t - shirt with a smile on his face man in green shirt with blue eyes and a smile on his face taken from the top, a man with a big face someone is sitting on the bed and holding a bowl in front of a tv

With "a photo of":

a photo of a man in a white shirt sitting on a couch a photo of a man smiling while holding a toothbrush a photo of a man in a green shirt is holding a doughnut a photo of a man is sitting on the ground

Can even be a lot more accurate when the subject is explicitly defined. ie. "a photo of a man"

cloneofsimo / lora

Support prefix text for blip captioning #166