NaN or Inf found in input tensor.

Description

(venv) kai@ns-staging:~/workspace/stable-dreamfusion$ python main.py --text "A red dinosaur in boots." --workspace /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k -O --iters 30000
Namespace(file=None, text='A red dinosaur in boots.', negative='', O=True, O2=False, test=False, six_views=False, eval_interval=1, test_interval=100, workspace='/var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k', seed=None, image=None, image_config=None, known_view_interval=4, IF=False, guidance=['SD'], guidance_scale=100, save_mesh=False, mcubes_resolution=256, decimate_target=50000.0, dmtet=False, tet_grid_size=128, init_with='', lock_geo=False, perpneg=False, negative_w=-2, front_decay_factor=2, side_decay_factor=10, iters=30000, lr=0.001, ckpt='latest', cuda_ray=True, taichi_ray=False, max_steps=1024, num_steps=64, upsample_steps=32, update_extra_interval=16, max_ray_batch=4096, latent_iter_ratio=0.2, albedo_iter_ratio=0, min_ambient_ratio=0.1, textureless_ratio=0.2, jitter_pose=False, jitter_center=0.2, jitter_target=0.2, jitter_up=0.02, uniform_sphere_rate=0, grad_clip=-1, grad_clip_rgb=-1, bg_radius=1.4, density_activation='exp', density_thresh=10, blob_density=5, blob_radius=0.2, backbone='grid', optim='adan', sd_version='2.1', hf_key=None, fp16=True, vram_O=False, w=64, h=64, known_view_scale=1.5, known_view_noise_scale=0.002, dmtet_reso_scale=8, batch_size=1, bound=1, dt_gamma=0, min_near=0.01, radius_range=[3.0, 3.5], theta_range=[45, 105], phi_range=[-180, 180], fovy_range=[10, 30], default_radius=3.2, default_polar=90, default_azimuth=0, default_fovy=20, progressive_view=False, progressive_view_init_ratio=0.2, progressive_level=False, angle_overhead=30, angle_front=60, t_range=[0.02, 0.98], dont_override_stuff=False, lambda_entropy=0.001, lambda_opacity=0, lambda_orient=0.01, lambda_tv=0, lambda_wd=0, lambda_mesh_normal=0.5, lambda_mesh_laplacian=0.5, lambda_guidance=1, lambda_rgb=1000, lambda_mask=500, lambda_normal=0, lambda_depth=10, lambda_2d_normal_smooth=0, lambda_3d_normal_smooth=0, save_guidance=False, save_guidance_interval=10, gui=False, W=800, H=800, radius=5, fovy=20, light_theta=60, light_phi=0, max_spp=1, zero123_config='./pretrained/zero123/sd-objaverse-finetune-c_concat-256.yaml', zero123_ckpt='./pretrained/zero123/105000.ckpt', zero123_grad_scale='angle', dataset_size_train=100, dataset_size_valid=8, dataset_size_test=100, exp_start_iter=0, exp_end_iter=30000, images=None, ref_radii=[], ref_polars=[], ref_azimuths=[], zero123_ws=[], default_zero123_w=1)
NeRFNetwork(
  (encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 2048 per_level_scale=1.3819 params=(6098120, 2) gridtype=hash align_corners=False interpolation=smoothstep
  (sigma_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=32, out_features=64, bias=True)
      (1): Linear(in_features=64, out_features=64, bias=True)
      (2): Linear(in_features=64, out_features=4, bias=True)
    )
  )
  (encoder_bg): FreqEncoder: input_dim=3 degree=6 output_dim=39
  (bg_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=39, out_features=32, bias=True)
      (1): Linear(in_features=32, out_features=3, bias=True)
    )
  )
)
[INFO] loading stable diffusion...
[INFO] loaded stable diffusion!
[INFO] Cmdline: main.py --text A red dinosaur in boots. --workspace /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k -O --iters 30000
[INFO] opt: Namespace(file=None, text='A red dinosaur in boots.', negative='', O=True, O2=False, test=False, six_views=False, eval_interval=1, test_interval=100,
workspace='/var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k', seed=None, image=None, image_config=None, known_view_interval=4, IF=False, guidance=['SD'], guidance_scale=100, save_mesh=False,
mcubes_resolution=256, decimate_target=50000.0, dmtet=False, tet_grid_size=128, init_with='', lock_geo=False, perpneg=False, negative_w=-2, front_decay_factor=2, side_decay_factor=10, iters=30000, lr=0.001,
ckpt='latest', cuda_ray=True, taichi_ray=False, max_steps=1024, num_steps=64, upsample_steps=32, update_extra_interval=16, max_ray_batch=4096, latent_iter_ratio=0.2, albedo_iter_ratio=0, min_ambient_ratio=0.1,
textureless_ratio=0.2, jitter_pose=False, jitter_center=0.2, jitter_target=0.2, jitter_up=0.02, uniform_sphere_rate=0, grad_clip=-1, grad_clip_rgb=-1, bg_radius=1.4, density_activation='exp', density_thresh=10,
blob_density=5, blob_radius=0.2, backbone='grid', optim='adan', sd_version='2.1', hf_key=None, fp16=True, vram_O=False, w=64, h=64, known_view_scale=1.5, known_view_noise_scale=0.002, dmtet_reso_scale=8,
batch_size=1, bound=1, dt_gamma=0, min_near=0.01, radius_range=[3.0, 3.5], theta_range=[45, 105], phi_range=[-180, 180], fovy_range=[10, 30], default_radius=3.2, default_polar=90, default_azimuth=0,
default_fovy=20, progressive_view=False, progressive_view_init_ratio=0.2, progressive_level=False, angle_overhead=30, angle_front=60, t_range=[0.02, 0.98], dont_override_stuff=False, lambda_entropy=0.001,
lambda_opacity=0, lambda_orient=0.01, lambda_tv=0, lambda_wd=0, lambda_mesh_normal=0.5, lambda_mesh_laplacian=0.5, lambda_guidance=1, lambda_rgb=1000, lambda_mask=500, lambda_normal=0, lambda_depth=10,
lambda_2d_normal_smooth=0, lambda_3d_normal_smooth=0, save_guidance=False, save_guidance_interval=10, gui=False, W=800, H=800, radius=5, fovy=20, light_theta=60, light_phi=0, max_spp=1,
zero123_config='./pretrained/zero123/sd-objaverse-finetune-c_concat-256.yaml', zero123_ckpt='./pretrained/zero123/105000.ckpt', zero123_grad_scale='angle', dataset_size_train=100, dataset_size_valid=8,
dataset_size_test=100, exp_start_iter=0, exp_end_iter=30000, images=None, ref_radii=[], ref_polars=[], ref_azimuths=[], zero123_ws=[], default_zero123_w=1)
[INFO] Trainer: df | 2023-07-17_21-08-20 | cuda | fp16 | /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k
[INFO] #parameters: 12204151
[INFO] Loading latest checkpoint ...
[WARN] No checkpoint found, model randomly initialized.

......

==> [2023-07-17_21-23-40] Start Training /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k Epoch 81/300, lr=0.050000 ...
loss=1.0000 (1.0000), lr=0.050000: : 100% 100/100 [00:18<00:00,  5.36it/s]
==> [2023-07-17_21-23-59] Finished Epoch 81/300. CPU=3.9GB, GPU=8.0GB.
++> Evaluate /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k at epoch 81 ...
loss=0.0000 (0.0000): : 100% 8/8 [00:00<00:00, 53.78it/s]
++> Evaluate epoch 81 Finished.
==> [2023-07-17_21-23-59] Start Training /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k Epoch 82/300, lr=0.050000 ...
loss=1.0000 (1.0000), lr=0.050000: :  50% 50/100 [00:09<00:09,  5.39it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: :  51% 51/100 [00:09<00:09,  5.36it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: :  52% 52/100 [00:09<00:08,  5.35it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: :  53% 53/100 [00:09<00:08,  5.35it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: :  54% 54/100 [00:10<00:08,  5.34it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: :  55% 55/100 [00:10<00:08,  5.36it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: :  56% 56/100 [00:10<00:08,  5.33it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: :  57% 57/100 [00:10<00:08,  5.33it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: :  58% 58/100 [00:10<00:07,  5.36it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: :  59% 59/100 [00:10<00:07,  5.34it/s]NaN or Inf found in input tensor.
loss=nan (nan), lr=0.050000: :  60% 60/100 [00:11<00:07,  5.35it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/kai/workspace/stable-dreamfusion/main.py:410 in <module>                                   │
│                                                                                                  │
│   407 │   │   │   test_loader = NeRFDataset(opt, device=device, type='test', H=opt.H, W=opt.W,   │
│   408 │   │   │                                                                                  │
│   409 │   │   │   max_epoch = np.ceil(opt.iters / len(train_loader)).astype(np.int32)            │
│ ❱ 410 │   │   │   trainer.train(train_loader, valid_loader, test_loader, max_epoch)              │
│   411 │   │   │                                                                                  │
│   412 │   │   │   if opt.save_mesh:                                                              │
│   413 │   │   │   │   trainer.save_mesh()                                                        │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/nerf/utils.py:812 in train                                │
│                                                                                                  │
│    809 │   │   for epoch in range(self.epoch + 1, max_epochs + 1):                               │
│    810 │   │   │   self.epoch = epoch                                                            │
│    811 │   │   │                                                                                 │
│ ❱  812 │   │   │   self.train_one_epoch(train_loader, max_epochs)                                │
│    813 │   │   │                                                                                 │
│    814 │   │   │   if self.workspace is not None and self.local_rank == 0:                       │
│    815 │   │   │   │   self.save_checkpoint(full=True, best=False)                               │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/nerf/utils.py:1049 in train_one_epoch                     │
│                                                                                                  │
│   1046 │   │   │   │   │   save_guidance_path = save_guidance_folder / f'step_{self.global_step  │
│   1047 │   │   │   │   else:                                                                     │
│   1048 │   │   │   │   │   save_guidance_path = None                                             │
│ ❱ 1049 │   │   │   │   pred_rgbs, pred_depths, loss = self.train_step(data, save_guidance_path=  │
│   1050 │   │   │                                                                                 │
│   1051 │   │   │   # hooked grad clipping for RGB space                                          │
│   1052 │   │   │   if self.opt.grad_clip_rgb >= 0:                                               │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/nerf/utils.py:537 in train_step                           │
│                                                                                                  │
│    534 │   │   │   else:                                                                         │
│    535 │   │   │   │   bg_color = torch.rand(3).to(self.device) # single color random bg         │
│    536 │   │                                                                                     │
│ ❱  537 │   │   outputs = self.model.render(rays_o, rays_d, mvp, H, W, staged=False, perturb=Tru  │
│    538 │   │   pred_depth = outputs['depth'].reshape(B, 1, H, W)                                 │
│    539 │   │   pred_mask = outputs['weights_sum'].reshape(B, 1, H, W)                            │
│    540 │   │   if 'normal_image' in outputs:                                                     │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/nerf/renderer.py:1163 in render                           │
│                                                                                                  │
│   1160 │   │   if self.dmtet:                                                                    │
│   1161 │   │   │   results = self.run_dmtet(rays_o, rays_d, mvp, h, w, **kwargs)                 │
│   1162 │   │   elif self.cuda_ray:                                                               │
│ ❱ 1163 │   │   │   results = self.run_cuda(rays_o, rays_d, **kwargs)                             │
│   1164 │   │   elif self.taichi_ray:                                                             │
│   1165 │   │   │   results = self.run_taichi(rays_o, rays_d, **kwargs)                           │
│   1166 │   │   else:                                                                             │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/nerf/renderer.py:739 in run_cuda                          │
│                                                                                                  │
│    736 │   │   │   │   flatten_rays = raymarching.flatten_rays(rays, xyzs.shape[0]).long()       │
│    737 │   │   │   │   light_d = light_d[flatten_rays]                                           │
│    738 │   │   │                                                                                 │
│ ❱  739 │   │   │   sigmas, rgbs, normals = self(xyzs, dirs, light_d, ratio=ambient_ratio, shadi  │
│    740 │   │   │   weights, weights_sum, depth, image = raymarching.composite_rays_train(sigmas  │
│    741 │   │   │                                                                                 │
│    742 │   │   │   # normals related regularizations                                             │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/module │
│ .py:1501 in _call_impl                                                                           │
│                                                                                                  │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1502 │   │   # Do not call functions when jit is used                                          │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1504 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/nerf/network_grid.py:110 in forward                       │
│                                                                                                  │
│   107 │   │   # l: [3], plane light direction, nomalized in [-1, 1]                              │
│   108 │   │   # ratio: scalar, ambient ratio, 1 == no shading (albedo only), 0 == only shading   │
│   109 │   │                                                                                      │
│ ❱ 110 │   │   sigma, albedo = self.common_forward(x)                                             │
│   111 │   │                                                                                      │
│   112 │   │   if shading == 'albedo':                                                            │
│   113 │   │   │   normal = None                                                                  │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/nerf/network_grid.py:73 in common_forward                 │
│                                                                                                  │
│    70 │   │   # sigma                                                                            │
│    71 │   │   enc = self.encoder(x, bound=self.bound, max_level=self.max_level)                  │
│    72 │   │                                                                                      │
│ ❱  73 │   │   h = self.sigma_net(enc)                                                            │
│    74 │   │                                                                                      │
│    75 │   │   sigma = self.density_activation(h[..., 0] + self.density_blob(x))                  │
│    76 │   │   albedo = torch.sigmoid(h[..., 1:])                                                 │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/module │
│ .py:1501 in _call_impl                                                                           │
│                                                                                                  │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1502 │   │   # Do not call functions when jit is used                                          │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1504 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/nerf/network_grid.py:29 in forward                        │
│                                                                                                  │
│    26 │                                                                                          │
│    27 │   def forward(self, x):                                                                  │
│    28 │   │   for l in range(self.num_layers):                                                   │
│ ❱  29 │   │   │   x = self.net[l](x)                                                             │
│    30 │   │   │   if l != self.num_layers - 1:                                                   │
│    31 │   │   │   │   x = F.relu(x, inplace=True)                                                │
│    32 │   │   return x                                                                           │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/module │
│ .py:1501 in _call_impl                                                                           │
│                                                                                                  │
│   1498 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1499 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1500 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1501 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1502 │   │   # Do not call functions when jit is used                                          │
│   1503 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1504 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/linear │
│ .py:114 in forward                                                                               │
│                                                                                                  │
│   111 │   │   │   init.uniform_(self.bias, -bound, bound)                                        │
│   112 │                                                                                          │
│   113 │   def forward(self, input: Tensor) -> Tensor:                                            │
│ ❱ 114 │   │   return F.linear(input, self.weight, self.bias)                                     │
│   115 │                                                                                          │
│   116 │   def extra_repr(self) -> str:                                                           │
│   117 │   │   return 'in_features={}, out_features={}, bias={}'.format(                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

loss=nan (nan), lr=0.050000: :  60% 60/100 [00:11<00:07,  5.19it/s]

Steps to Reproduce

python main.py --text "A red dinosaur in boots." --workspace /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k -O --iters 30000

Expected Behavior

no crash.

Environment

Ubuntu 22.02 / PyTorch 2.0.1 / CUDA 11.7

ashawkey / stable-dreamfusion

NaN or Inf found in input tensor. #331

Description

Steps to Reproduce

Expected Behavior

Environment