In the training phase, the noise generated from q_sample, the next step of prediction is the loss calculated with label. In my understanding, xt should calculate the loss with xt-1, and then generate x0 for the final target detection. But here why not calculate the loss with the next step, but directly calculate with the standard label?
In the training phase, the noise generated from q_sample, the next step of prediction is the loss calculated with label. In my understanding, xt should calculate the loss with xt-1, and then generate x0 for the final target detection. But here why not calculate the loss with the next step, but directly calculate with the standard label?