JeffreyXiang / GRAM-HD

PyTorch implementation of the ICCV paper "GRAM-HD: 3D-Consistent Image Generation at High Resolution with Generative Radiance Manifolds"
https://jeffreyxiang.github.io/GRAM-HD/
MIT License
30 stars 2 forks source link

Does GRAMHD support 360-degree object data, like Carla? #2

Open JiuTongBro opened 6 months ago

JiuTongBro commented 6 months ago

Hi. Thanks for your excellent work.

I wonder does GRAMHD support 360-degree object data, like Carla/Shapenet? I noticed GRAM achieved a satisfying result on Carla, so I am quite interested what it will be if we use GRAMHD. Have you tested on those datasets?😀

JeffreyXiang commented 6 months ago

We've tested our method on Carla. It can work, but the performance is not as satisfying as GRAM under 128^2. This may be because that the artifacts brought by limited number of manifold surfaces are hard to suppress under higher resolution and 360-degree viewpoint.

JiuTongBro commented 6 months ago

Thanks for you Reply!

I modified a GRAM-64 config and a GRAMHD-128 config by myself based on the GRAM code, and tried to train a GRAMHD for Carla. However, it fails even in the coarse GRAM-64 stage.

This is the coarse GRAM results after 100000 iteration's training. Y}_8PCTKY2J_5E3N`I$L94U

I suppose perhaps I made some mistakes in my modified code? I didn't change anything in your version's GRAM code. I wonder, is your version's GRAM code the same as the original GRAM code? Is there anything else I need to modify to make the code sucessfully run on Carla Dataset?

This is the modified GRAM-64 config:

GRAM64_Carla = {
    'global': {
        'img_size': 64,
        'batch_size': 4,
        'z_dist': 'gaussian',
    },
    'optimizer': {
        'gen_lr': 2e-5,
        'disc_lr': 2e-4,
        'sampling_network_lr': 2e-6,
        'betas': (0, 0.9),
        'grad_clip': 0.3,
    },
    'process': {
        'class': 'Gan3DProcess',
        'kwargs': {
            'batch_split': 4,
            'real_pos_lambda': 15.,
            'r1_lambda': 1.,
            'pos_lambda': 15.,
        }
    },
    'generator': {
        'class': 'GramGenerator',
        'kwargs': {
            'z_dim': 256,
            'img_size': 64,
            'h_stddev': math.pi,
            'v_stddev': math.pi*(42.5/180),
            'h_mean': math.pi*0.5,
            'v_mean': math.pi*(42.5/180),
            'sample_dist': 'spherical_uniform',
        },
        'representation': {
            'class': 'gram',
            'kwargs': {
                'hidden_dim': 256,
                'normalize': 2,
                'sigma_clamp_mode': 'softplus',
                'rgb_clamp_mode': 'widen_sigmoid',
                'hidden_dim_sample': 256,
                'layer_num_sample': 3,
                'center': (0, 0, 0),
                'init_radius': 0,
            },
        },
        'renderer': {
            'class': 'manifold_renderer',
            'kwargs': {
                'num_samples': 64,
                'num_manifolds': 48,
                'levels_start': 35,
                'levels_end': 5,
                'delta_alpha': 0.02,
                'last_back': False,
                'white_back': True,
            }
        }
    },
    'discriminator': {
        'class': 'GramEncoderDiscriminator',
        'kwargs': {
            'img_size': 64,
        }
    },
    'dataset': {
        'class': 'CARLA',
        'kwargs': {
            'img_size': 64,
            'real_pose': True,
        }
    },
    'camera': {
        'fov': 30,
        'ray_start': 0.7,
        'ray_end': 1.3,
    }
}

And this is the modified GRAMHD-128 config:

GRAMHD128_Carla = {
    'global': {
        'img_size': 128,
        'batch_size': 4,
        'z_dist': 'gaussian',
    },
    'optimizer': {
        'gen_lr': 2e-5,
        'disc_lr': 2e-4,
        'sampling_network_lr': 2e-6,
        'betas': (0, 0.9),
        'grad_clip': 0.3,
    },
    'process': {
        'class': 'SRGan3DProcess',
        'kwargs': {
            'batch_split': 4,
            'pos_lambda': 15.,
            'real_pos_lambda': 15.,
            'r1_lambda': 1.,
            'cons_lambda': 3.,
            'use_patch_d': True,
            'patch_lambda': 0.1,
            'r1_patch': True,
        }
    },
    'generator': {
        'class': 'GramHDGenerator',
        'kwargs': {
            'z_dim': 256,
            'feature_dim': 32,
            'img_size': 128,
            'lr_img_size': 64,
            'h_stddev': math.pi,
            'v_stddev': math.pi*(42.5/180),
            'h_mean': math.pi*0.5,
            'v_mean': math.pi*(42.5/180),
            'sample_dist': 'spherical_uniform',
            'gram_model_file': 'out/carla_gram/step100000_generator.pth',    # If you want to train your own model, set this to the stage1 GRAM model file
        },
        'representation': {
            'class': 'gram',
            'kwargs': {
                'hidden_dim': 256,
                'normalize': 2,
                'sigma_clamp_mode': 'softplus',
                'rgb_clamp_mode': 'widen_sigmoid',
                'hidden_dim_sample': 256,
                'layer_num_sample': 3,
                'center': (0, 0, 0),
                'init_radius': 0,
            },
        },
        'super_resolution': {
            'class': 'styleesrgan',
            'kwargs': {
                'fg': {
                    'w_dim': 256,
                    'nf': 64,
                    'nb': 8,
                    'gc': 32,
                    'up_channels': [64,],
                    'to_rgb_ks': 1,
                },
                'bg': {
                    'nf': 64,
                    'nb': 4,
                    'gc': 32,
                    'up_channels': [64,],
                    'use_pixel_shuffle': False,
                    'global_residual': True
                },
            }
        },
        'renderer': {
            'class': 'manifold_sr_renderer',
            'kwargs': {
                'num_samples': 64,
                'num_manifolds': 48,
                'levels_start': 35,
                'levels_end': 5,
                'delta_alpha': 0.02,
                'last_back': False,
                'white_back': True,
            }
        }
    },
    'discriminator': {
        'class': 'GramEncoderPatchDiscriminator',
        'kwargs': {
            'img_size': 128,
            'norm_layer': nn.Identity,
        }
    },
    'dataset': {
        'class': 'CARLA',
        'kwargs': {
            'img_size': 128,
            'real_pose': True,
        }
    },
    'camera': {
        'fov': 30,
        'ray_start': 0.7,
        'ray_end': 1.3,
    }
}

Would you kindly help me figure out the error? Thanks!

JeffreyXiang commented 6 months ago

I checked my implementation and do found that I missed some configs used for Carla training. The sampling_network_lr argument does not change the learning rate for the sampling network.

I think one option is to modify the code according to the GRAM code for this part. Another option is to directly use the 128^2 Carla checkpoint from GRAM (My experiment for Carla directly use GRAM's ckpt, that's why I didn't notice this issue).

JiuTongBro commented 6 months ago

Sincerely Thanks! I will try it.

JiuTongBro commented 6 months ago

Hi. Thanks for your suggestion.

I followed this pipeline to directly train a GRAMHD model based on the official GRAM-Carla-128 checkpoint.

However, the low-resolution generated images, which are produced by the low-resolution GRAM, seems to be worse than the inferecne results produced by the official GRAM code. I checked the low-res results generated in different epochs. They are all the same. So it is correct that the low-res GRAM is frozen during the HD training. T1LK)Z9} 6}YJ56))TR TG

The only change I made, it's to remove the downsample in Cross-resolution consistency loss. As the HR image and LR image both has a resolution of 128:

if generator_ddp.module.scale_factor == 1.:
    cons_penalty = self.cons_lambda * ((gen_imgs - lr_imgs)**2).mean()
    cons_penalty += self.cons_lambda * ((sr_rgba - lr_rgba)**2).mean()
else:
    cons_penalty = self.cons_lambda * ((bicubic_downsample(gen_imgs, generator_ddp.module.scale_factor) - lr_imgs)**2).mean()
    cons_penalty += self.cons_lambda * ((bicubic_downsample(sr_rgba, generator_ddp.module.scale_factor) - lr_rgba)**2).mean()

I wonder do you know why this happens? And did noticed the same issue in your Carla training?

Thanks!😀