I notice that in your pipeline, the ip2p uses the original image (which is regarded as a GT dataset) as the conditional image. However, I wonder why not use the rendered image as the condition.
During my own attempts, I trained the NeRF on the half-resolution. Thus when I need to use ip2p, I should previously resize the rendered images and original images to the same standardized size. However, I found that the resize operation made them not align very well, thus leading to unpredicted editing results. Meanwhile, if I used the rendered image as the condition, it seemed that they aligned well, accordingly making reasonable results.
Thanks for your great work!
I notice that in your pipeline, the ip2p uses the original image (which is regarded as a GT dataset) as the conditional image. However, I wonder why not use the rendered image as the condition.
During my own attempts, I trained the NeRF on the half-resolution. Thus when I need to use ip2p, I should previously resize the rendered images and original images to the same standardized size. However, I found that the resize operation made them not align very well, thus leading to unpredicted editing results. Meanwhile, if I used the rendered image as the condition, it seemed that they aligned well, accordingly making reasonable results.