Open snowflakewang opened 4 months ago
We use multiview high resolution and low resolution pairs. Multiview images comes from blender's rendering results for the objaverse dataset.
Thank you for your reply! Do you mean that rendering the Objaverse 3D dataset in two different resolutions (one is relatively high and another one is relatively low) in order to construct data pairs?
Yes, we use a (256,512) resolution pair for the first stage of super-resolution training, where the 256 resolution portion is augmented using downsampling to a random resolution and then upsampled back to 256, along with some random noise, to get a 256 resolution image with artifacts. This allows the super-resolution model at this step to correct some minor errors in generation.
Hello, thank you for your great work on high-resolution image-to-3D generation! I noticed that you utilized a ControlNet-Tile based on SD1.5 to achieve the first stage of super-resolution. I am curious about which data you used to fine-tune. It seems that fine-tuning a ControlNet usually needs data pairs (e.g. image-normal pair, image-depth pair, LR-HR pair).
Thank you :)