apapiu / guided-diffusion-keras

Text to Image Diffusion Models in Keras
Apache License 2.0
74 stars 7 forks source link

concatenate the conditional and unconditional inputs to speed inference #4

Open jakob1519 opened 1 year ago

jakob1519 commented 1 year ago

hello,

I want to ask this code in diffuser.py why it can speed inference? could you explain it to me?

nn_inputs = [np.vstack([x_t, x_t]),
                     np.vstack([noise_in, noise_in]),
                     np.vstack([label, label_empty_ohe])]
apapiu commented 1 year ago

Hey! The speedup happens in the next line: x0_pred = self.denoiser.predict(nn_inputs, batch_size=self.batch_size). Here we only have to call .predict once on the concatenated matrix which is faster than calling .predict twice on conditional and unconditional inputs.

jakob1519 commented 1 year ago

Thank you! I got it!

And this part of the code in diffuser.py I didn't know what this part did

What is the difference between x0_pred_label and x0_pred_no_label?

# classifier free guidance:
x0_pred = self.class_guidance * x0_pred_label + (1 - self.class_guidance) * x0_pred_no_label

if self.perc_thresholding:
    # clip the prediction using dynamic thresholding a la Imagen:
    x0_pred = dynamic_thresholding(x0_pred, perc=self.perc_thresholding)
apapiu commented 1 year ago

x0_pred_label is the prediction conditioned on the text embedding and x0_pred_no_label is the unconditional prediction (where the text embedding input is 0).

jakob1519 commented 1 year ago

Got it! Thank you!