crowsonkb / v-diffusion-pytorch

v objective diffusion inference code for PyTorch.
MIT License
715 stars 108 forks source link

what does this line mean in README? #16

Closed illtellyoulater closed 2 years ago

illtellyoulater commented 2 years ago

A weight of 1 will sample images that match the prompt roughly as well as images usually match prompts like that in the training set.

I can't wrap my head around this sentence. Could you please explain it with different wording? Thanks!

crowsonkb commented 2 years ago

One recurring problem we have with text to image models is that our paired image/text datasets just aren't as good as we would like. The images and captions rarely match very well. So we use this "classifier-free guidance" trick to sample conditional on stronger image/caption matching than was typical in the training set. If you use 1 the image will tend to match the caption about as well as images matched captions in the training set.

illtellyoulater commented 2 years ago

many thanks!