google-research / nerf-from-image

Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion
Apache License 2.0
381 stars 18 forks source link

Error when running `python run.py --dataset carla --path_length_regularization --gpus 4 --batch_size 8` #14

Closed 013292 closed 1 year ago

013292 commented 1 year ago
Namespace(gpus=4, dataset='carla', xid='', resolution=32, batch_size=8, run_inversion=False, resume_from=None, root_path='.', data_path='../data/nerf', iterations=300000, lr_g=0.0025, lr_d=0.002, dual_discriminator=False, dual_discriminator_l1=False, dual_discriminator_mse=False, r1=5.0, tv=0.5, entropy=0.05, eikonal=0.1, supervise_alpha=False, conditional_pose=True, augment_p=0, augment_ada=False, ada_target=0.6, path_length_regularization=True, perturb_poses=0, clip_gradient_norm=100.0, fine_sampling=True, attention_values=10, use_sdf=True, use_encoder=False, use_viewdir=False, use_class=False, latent_dim=512, disable_stylegan_noise=True, inv_use_testset=False, inv_use_imagenet_testset=False, inv_use_separate=False, inv_loss='vgg', inv_gain_z=5, inv_steps=None, inv_no_split=False, inv_no_optimize_pose=False, inv_train_coord_only=False, inv_encoder_only=False, inv_export_demo_sample=False, inv_manual_input_path=None, coord_resume_from=None)
Experiment name g_carla_res32_bs8_d512_lrg_0.0025_lrd_0.002_r1_5.0_entropy_0.05_tv_0.5_fine_sdf_eik0.1_attn10_noalpha_pose_ppl_nonoise
Saving checkpoints to ./gan_checkpoints/g_carla_res32_bs8_d512_lrg_0.0025_lrd_0.002_r1_5.0_entropy_0.05_tv_0.5_fine_sdf_eik0.1_attn10_noalpha_pose_ppl_nonoise
Saving tensorboard logs to ./gan_logs/g_carla_res32_bs8_d512_lrg_0.0025_lrd_0.002_r1_5.0_entropy_0.05_tv_0.5_fine_sdf_eik0.1_attn10_noalpha_pose_ppl_nonoise
Loading data...
10000 images
100%|████████████████████████████████████████████████████████████████████| 313/313 [00:37<00:00,  8.40it/s]
torch.Size([10000, 32, 32, 3])
Initializing Inception network, tensorflow weights...
Computing FID stats for training set...
100%|██████████████████████████████████████████████████████████████████| 1250/1250 [00:14<00:00, 87.64it/s]
Evaluating training FID on 8000 images
Params G: 32.189208 M
Params D_0: 22.03904 M
Effective G lr: 0.00025
Effective D lr: 0.0002
SDF pre-training...
dist 5.601495742797852 eik 0.3089677691459656
dist 0.0919375941157341 eik 0.21124015748500824
dist 0.023604262620210648 eik 0.09792964160442352
dist 0.024225061759352684 eik 0.06614325940608978
dist 0.01424572616815567 eik 0.05083160847425461
dist 0.008697261102497578 eik 0.0333218015730381
dist 0.007261154241859913 eik 0.029520578682422638
dist 0.008733823895454407 eik 0.02414126880466938
dist 0.005254837684333324 eik 0.022821318358182907
dist 0.005635770969092846 eik 0.01830517128109932
SDF pre-training done.
Traceback (most recent call last):
  File "/home/shared/RX0251_wangfeifan/workspace/nerf-from-image-main/run.py", line 1003, in <module>
    discriminated = target_discriminator(img_batch, i,
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 171, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py", line 181, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 89, in parallel_apply
    output.reraise()
  File "/opt/conda/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
    output = module(*input, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/shared/RX0251_wangfeifan/workspace/nerf-from-image-main/models/discriminator.py", line 80, in forward
    return self.backbone(x, cond)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/shared/RX0251_wangfeifan/workspace/nerf-from-image-main/models/stylegan.py", line 675, in forward
    x = self.b4(x, cmap)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/shared/RX0251_wangfeifan/workspace/nerf-from-image-main/models/stylegan.py", line 600, in forward
    x = self.mbstd(x)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/shared/RX0251_wangfeifan/workspace/nerf-from-image-main/models/stylegan.py", line 559, in forward
    y = x.reshape(ng, -1, f, nc, h, w)
RuntimeError: shape '[4, -1, 1, 512, 4, 4]' is invalid for input of size 16384
dariopavllo commented 1 year ago

Hi,

You have to either increase the batch size or reduce the number of GPUs so that the batch size per GPU is a multiple of 4. Alternatively, try to comment out or change the minibatch std size in the discriminator.

013292 commented 1 year ago

Thank you, it works.