Questions about fine-tuning ControlNet-Tile to achieve super-resolution

AiuniAI / Unique3D

[NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

https://wukailu.github.io/Unique3D/

MIT License

3.1k stars 246 forks source link

Questions about fine-tuning ControlNet-Tile to achieve super-resolution #73

Open snowflakewang opened 4 months ago

snowflakewang commented 4 months ago

Hello, thank you for your great work on high-resolution image-to-3D generation! I noticed that you utilized a ControlNet-Tile based on SD1.5 to achieve the first stage of super-resolution. I am curious about which data you used to fine-tune. It seems that fine-tuning a ControlNet usually needs data pairs (e.g. image-normal pair, image-depth pair, LR-HR pair).

Thank you :)

wukailu commented 4 months ago

We use multiview high resolution and low resolution pairs. Multiview images comes from blender's rendering results for the objaverse dataset.

snowflakewang commented 4 months ago

Thank you for your reply! Do you mean that rendering the Objaverse 3D dataset in two different resolutions (one is relatively high and another one is relatively low) in order to construct data pairs?

wukailu commented 4 months ago

Yes, we use a (256,512) resolution pair for the first stage of super-resolution training, where the 256 resolution portion is augmented using downsampling to a random resolution and then upsampled back to 256, along with some random noise, to get a 256 resolution image with artifacts. This allows the super-resolution model at this step to correct some minor errors in generation.