The equ 1 can be seen as:
the $\epsilon(zt, \emptyset, I{prompt})$ is computed conditioned on empty image and $I_{prompt}$.
But I found in the inference code(line 895 in model/pipeline.py), you do not compute noise_pred_img according to $\epsilon(zt, \emptyset, I{prompt})$.
noise_pred_null represents $\epsilon(z_t, \emptyset, \emptyset)$,
noise_pred_text is $\epsilon(zt, I{e}^{'}, I_{prompt})$
noise_pred_img is $\epsilon(zt, I{e}, \emptyset)$,
noise_pred_full is $\epsilon(zt, I{e}, I_{prompt})$.
So based on the above analysis, can you explain why this happens?
The equ 1 can be seen as: the $\epsilon(zt, \emptyset, I{prompt})$ is computed conditioned on empty image and $I_{prompt}$. But I found in the inference code(line 895 in model/pipeline.py), you do not compute noise_pred_img according to $\epsilon(zt, \emptyset, I{prompt})$. noise_pred_null represents $\epsilon(z_t, \emptyset, \emptyset)$, noise_pred_text is $\epsilon(zt, I{e}^{'}, I_{prompt})$ noise_pred_img is $\epsilon(zt, I{e}, \emptyset)$, noise_pred_full is $\epsilon(zt, I{e}, I_{prompt})$.
So based on the above analysis, can you explain why this happens?