Question about ISM loss in code

Thanks for the brilliant work!

I have a question regarding the implementation of the loss function. Based on the code provided, should the target be the unet_output at timestep t? If so, would the loss function be expressed as:

[ \epsilon(x_t, t, y) - \epsilon(x_t, t, ∅) ]

instead of:

[ \epsilon(x_t, t, y) - \epsilon(x_s, s, ∅) ]

as mentioned in the paper? I want to ensure I haven't overlooked anything in my understanding.

Thank you in advance for your time and assistance.

https://github.com/EnVision-Research/LucidDreamer/blob/2ecf0936617103107e4b20c34e94d204196f7a44/guidance/sd_utils.py#L278-L279

EnVision-Research / LucidDreamer

Question about ISM loss in code #45