Closed illtellyoulater closed 2 years ago
One recurring problem we have with text to image models is that our paired image/text datasets just aren't as good as we would like. The images and captions rarely match very well. So we use this "classifier-free guidance" trick to sample conditional on stronger image/caption matching than was typical in the training set. If you use 1 the image will tend to match the caption about as well as images matched captions in the training set.
many thanks!
I can't wrap my head around this sentence. Could you please explain it with different wording? Thanks!