Open vr-devil opened 1 year ago
(venv) kai@ns-staging:~/workspace/stable-dreamfusion$ python main.py --text "A red dinosaur in boots." --workspace /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k -O --iters 30000 Namespace(file=None, text='A red dinosaur in boots.', negative='', O=True, O2=False, test=False, six_views=False, eval_interval=1, test_interval=100, workspace='/var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k', seed=None, image=None, image_config=None, known_view_interval=4, IF=False, guidance=['SD'], guidance_scale=100, save_mesh=False, mcubes_resolution=256, decimate_target=50000.0, dmtet=False, tet_grid_size=128, init_with='', lock_geo=False, perpneg=False, negative_w=-2, front_decay_factor=2, side_decay_factor=10, iters=30000, lr=0.001, ckpt='latest', cuda_ray=True, taichi_ray=False, max_steps=1024, num_steps=64, upsample_steps=32, update_extra_interval=16, max_ray_batch=4096, latent_iter_ratio=0.2, albedo_iter_ratio=0, min_ambient_ratio=0.1, textureless_ratio=0.2, jitter_pose=False, jitter_center=0.2, jitter_target=0.2, jitter_up=0.02, uniform_sphere_rate=0, grad_clip=-1, grad_clip_rgb=-1, bg_radius=1.4, density_activation='exp', density_thresh=10, blob_density=5, blob_radius=0.2, backbone='grid', optim='adan', sd_version='2.1', hf_key=None, fp16=True, vram_O=False, w=64, h=64, known_view_scale=1.5, known_view_noise_scale=0.002, dmtet_reso_scale=8, batch_size=1, bound=1, dt_gamma=0, min_near=0.01, radius_range=[3.0, 3.5], theta_range=[45, 105], phi_range=[-180, 180], fovy_range=[10, 30], default_radius=3.2, default_polar=90, default_azimuth=0, default_fovy=20, progressive_view=False, progressive_view_init_ratio=0.2, progressive_level=False, angle_overhead=30, angle_front=60, t_range=[0.02, 0.98], dont_override_stuff=False, lambda_entropy=0.001, lambda_opacity=0, lambda_orient=0.01, lambda_tv=0, lambda_wd=0, lambda_mesh_normal=0.5, lambda_mesh_laplacian=0.5, lambda_guidance=1, lambda_rgb=1000, lambda_mask=500, lambda_normal=0, lambda_depth=10, lambda_2d_normal_smooth=0, lambda_3d_normal_smooth=0, save_guidance=False, save_guidance_interval=10, gui=False, W=800, H=800, radius=5, fovy=20, light_theta=60, light_phi=0, max_spp=1, zero123_config='./pretrained/zero123/sd-objaverse-finetune-c_concat-256.yaml', zero123_ckpt='./pretrained/zero123/105000.ckpt', zero123_grad_scale='angle', dataset_size_train=100, dataset_size_valid=8, dataset_size_test=100, exp_start_iter=0, exp_end_iter=30000, images=None, ref_radii=[], ref_polars=[], ref_azimuths=[], zero123_ws=[], default_zero123_w=1) NeRFNetwork( (encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 2048 per_level_scale=1.3819 params=(6098120, 2) gridtype=hash align_corners=False interpolation=smoothstep (sigma_net): MLP( (net): ModuleList( (0): Linear(in_features=32, out_features=64, bias=True) (1): Linear(in_features=64, out_features=64, bias=True) (2): Linear(in_features=64, out_features=4, bias=True) ) ) (encoder_bg): FreqEncoder: input_dim=3 degree=6 output_dim=39 (bg_net): MLP( (net): ModuleList( (0): Linear(in_features=39, out_features=32, bias=True) (1): Linear(in_features=32, out_features=3, bias=True) ) ) ) [INFO] loading stable diffusion... [INFO] loaded stable diffusion! [INFO] Cmdline: main.py --text A red dinosaur in boots. --workspace /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k -O --iters 30000 [INFO] opt: Namespace(file=None, text='A red dinosaur in boots.', negative='', O=True, O2=False, test=False, six_views=False, eval_interval=1, test_interval=100, workspace='/var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k', seed=None, image=None, image_config=None, known_view_interval=4, IF=False, guidance=['SD'], guidance_scale=100, save_mesh=False, mcubes_resolution=256, decimate_target=50000.0, dmtet=False, tet_grid_size=128, init_with='', lock_geo=False, perpneg=False, negative_w=-2, front_decay_factor=2, side_decay_factor=10, iters=30000, lr=0.001, ckpt='latest', cuda_ray=True, taichi_ray=False, max_steps=1024, num_steps=64, upsample_steps=32, update_extra_interval=16, max_ray_batch=4096, latent_iter_ratio=0.2, albedo_iter_ratio=0, min_ambient_ratio=0.1, textureless_ratio=0.2, jitter_pose=False, jitter_center=0.2, jitter_target=0.2, jitter_up=0.02, uniform_sphere_rate=0, grad_clip=-1, grad_clip_rgb=-1, bg_radius=1.4, density_activation='exp', density_thresh=10, blob_density=5, blob_radius=0.2, backbone='grid', optim='adan', sd_version='2.1', hf_key=None, fp16=True, vram_O=False, w=64, h=64, known_view_scale=1.5, known_view_noise_scale=0.002, dmtet_reso_scale=8, batch_size=1, bound=1, dt_gamma=0, min_near=0.01, radius_range=[3.0, 3.5], theta_range=[45, 105], phi_range=[-180, 180], fovy_range=[10, 30], default_radius=3.2, default_polar=90, default_azimuth=0, default_fovy=20, progressive_view=False, progressive_view_init_ratio=0.2, progressive_level=False, angle_overhead=30, angle_front=60, t_range=[0.02, 0.98], dont_override_stuff=False, lambda_entropy=0.001, lambda_opacity=0, lambda_orient=0.01, lambda_tv=0, lambda_wd=0, lambda_mesh_normal=0.5, lambda_mesh_laplacian=0.5, lambda_guidance=1, lambda_rgb=1000, lambda_mask=500, lambda_normal=0, lambda_depth=10, lambda_2d_normal_smooth=0, lambda_3d_normal_smooth=0, save_guidance=False, save_guidance_interval=10, gui=False, W=800, H=800, radius=5, fovy=20, light_theta=60, light_phi=0, max_spp=1, zero123_config='./pretrained/zero123/sd-objaverse-finetune-c_concat-256.yaml', zero123_ckpt='./pretrained/zero123/105000.ckpt', zero123_grad_scale='angle', dataset_size_train=100, dataset_size_valid=8, dataset_size_test=100, exp_start_iter=0, exp_end_iter=30000, images=None, ref_radii=[], ref_polars=[], ref_azimuths=[], zero123_ws=[], default_zero123_w=1) [INFO] Trainer: df | 2023-07-17_21-08-20 | cuda | fp16 | /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k [INFO] #parameters: 12204151 [INFO] Loading latest checkpoint ... [WARN] No checkpoint found, model randomly initialized. ...... ==> [2023-07-17_21-23-40] Start Training /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k Epoch 81/300, lr=0.050000 ... loss=1.0000 (1.0000), lr=0.050000: : 100% 100/100 [00:18<00:00, 5.36it/s] ==> [2023-07-17_21-23-59] Finished Epoch 81/300. CPU=3.9GB, GPU=8.0GB. ++> Evaluate /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k at epoch 81 ... loss=0.0000 (0.0000): : 100% 8/8 [00:00<00:00, 53.78it/s] ++> Evaluate epoch 81 Finished. ==> [2023-07-17_21-23-59] Start Training /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k Epoch 82/300, lr=0.050000 ... loss=1.0000 (1.0000), lr=0.050000: : 50% 50/100 [00:09<00:09, 5.39it/s]NaN or Inf found in input tensor. loss=nan (nan), lr=0.050000: : 51% 51/100 [00:09<00:09, 5.36it/s]NaN or Inf found in input tensor. loss=nan (nan), lr=0.050000: : 52% 52/100 [00:09<00:08, 5.35it/s]NaN or Inf found in input tensor. loss=nan (nan), lr=0.050000: : 53% 53/100 [00:09<00:08, 5.35it/s]NaN or Inf found in input tensor. loss=nan (nan), lr=0.050000: : 54% 54/100 [00:10<00:08, 5.34it/s]NaN or Inf found in input tensor. loss=nan (nan), lr=0.050000: : 55% 55/100 [00:10<00:08, 5.36it/s]NaN or Inf found in input tensor. loss=nan (nan), lr=0.050000: : 56% 56/100 [00:10<00:08, 5.33it/s]NaN or Inf found in input tensor. loss=nan (nan), lr=0.050000: : 57% 57/100 [00:10<00:08, 5.33it/s]NaN or Inf found in input tensor. loss=nan (nan), lr=0.050000: : 58% 58/100 [00:10<00:07, 5.36it/s]NaN or Inf found in input tensor. loss=nan (nan), lr=0.050000: : 59% 59/100 [00:10<00:07, 5.34it/s]NaN or Inf found in input tensor. loss=nan (nan), lr=0.050000: : 60% 60/100 [00:11<00:07, 5.35it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/kai/workspace/stable-dreamfusion/main.py:410 in <module> │ │ │ │ 407 │ │ │ test_loader = NeRFDataset(opt, device=device, type='test', H=opt.H, W=opt.W, │ │ 408 │ │ │ │ │ 409 │ │ │ max_epoch = np.ceil(opt.iters / len(train_loader)).astype(np.int32) │ │ ❱ 410 │ │ │ trainer.train(train_loader, valid_loader, test_loader, max_epoch) │ │ 411 │ │ │ │ │ 412 │ │ │ if opt.save_mesh: │ │ 413 │ │ │ │ trainer.save_mesh() │ │ │ │ /home/kai/workspace/stable-dreamfusion/nerf/utils.py:812 in train │ │ │ │ 809 │ │ for epoch in range(self.epoch + 1, max_epochs + 1): │ │ 810 │ │ │ self.epoch = epoch │ │ 811 │ │ │ │ │ ❱ 812 │ │ │ self.train_one_epoch(train_loader, max_epochs) │ │ 813 │ │ │ │ │ 814 │ │ │ if self.workspace is not None and self.local_rank == 0: │ │ 815 │ │ │ │ self.save_checkpoint(full=True, best=False) │ │ │ │ /home/kai/workspace/stable-dreamfusion/nerf/utils.py:1049 in train_one_epoch │ │ │ │ 1046 │ │ │ │ │ save_guidance_path = save_guidance_folder / f'step_{self.global_step │ │ 1047 │ │ │ │ else: │ │ 1048 │ │ │ │ │ save_guidance_path = None │ │ ❱ 1049 │ │ │ │ pred_rgbs, pred_depths, loss = self.train_step(data, save_guidance_path= │ │ 1050 │ │ │ │ │ 1051 │ │ │ # hooked grad clipping for RGB space │ │ 1052 │ │ │ if self.opt.grad_clip_rgb >= 0: │ │ │ │ /home/kai/workspace/stable-dreamfusion/nerf/utils.py:537 in train_step │ │ │ │ 534 │ │ │ else: │ │ 535 │ │ │ │ bg_color = torch.rand(3).to(self.device) # single color random bg │ │ 536 │ │ │ │ ❱ 537 │ │ outputs = self.model.render(rays_o, rays_d, mvp, H, W, staged=False, perturb=Tru │ │ 538 │ │ pred_depth = outputs['depth'].reshape(B, 1, H, W) │ │ 539 │ │ pred_mask = outputs['weights_sum'].reshape(B, 1, H, W) │ │ 540 │ │ if 'normal_image' in outputs: │ │ │ │ /home/kai/workspace/stable-dreamfusion/nerf/renderer.py:1163 in render │ │ │ │ 1160 │ │ if self.dmtet: │ │ 1161 │ │ │ results = self.run_dmtet(rays_o, rays_d, mvp, h, w, **kwargs) │ │ 1162 │ │ elif self.cuda_ray: │ │ ❱ 1163 │ │ │ results = self.run_cuda(rays_o, rays_d, **kwargs) │ │ 1164 │ │ elif self.taichi_ray: │ │ 1165 │ │ │ results = self.run_taichi(rays_o, rays_d, **kwargs) │ │ 1166 │ │ else: │ │ │ │ /home/kai/workspace/stable-dreamfusion/nerf/renderer.py:739 in run_cuda │ │ │ │ 736 │ │ │ │ flatten_rays = raymarching.flatten_rays(rays, xyzs.shape[0]).long() │ │ 737 │ │ │ │ light_d = light_d[flatten_rays] │ │ 738 │ │ │ │ │ ❱ 739 │ │ │ sigmas, rgbs, normals = self(xyzs, dirs, light_d, ratio=ambient_ratio, shadi │ │ 740 │ │ │ weights, weights_sum, depth, image = raymarching.composite_rays_train(sigmas │ │ 741 │ │ │ │ │ 742 │ │ │ # normals related regularizations │ │ │ │ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/module │ │ .py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /home/kai/workspace/stable-dreamfusion/nerf/network_grid.py:110 in forward │ │ │ │ 107 │ │ # l: [3], plane light direction, nomalized in [-1, 1] │ │ 108 │ │ # ratio: scalar, ambient ratio, 1 == no shading (albedo only), 0 == only shading │ │ 109 │ │ │ │ ❱ 110 │ │ sigma, albedo = self.common_forward(x) │ │ 111 │ │ │ │ 112 │ │ if shading == 'albedo': │ │ 113 │ │ │ normal = None │ │ │ │ /home/kai/workspace/stable-dreamfusion/nerf/network_grid.py:73 in common_forward │ │ │ │ 70 │ │ # sigma │ │ 71 │ │ enc = self.encoder(x, bound=self.bound, max_level=self.max_level) │ │ 72 │ │ │ │ ❱ 73 │ │ h = self.sigma_net(enc) │ │ 74 │ │ │ │ 75 │ │ sigma = self.density_activation(h[..., 0] + self.density_blob(x)) │ │ 76 │ │ albedo = torch.sigmoid(h[..., 1:]) │ │ │ │ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/module │ │ .py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /home/kai/workspace/stable-dreamfusion/nerf/network_grid.py:29 in forward │ │ │ │ 26 │ │ │ 27 │ def forward(self, x): │ │ 28 │ │ for l in range(self.num_layers): │ │ ❱ 29 │ │ │ x = self.net[l](x) │ │ 30 │ │ │ if l != self.num_layers - 1: │ │ 31 │ │ │ │ x = F.relu(x, inplace=True) │ │ 32 │ │ return x │ │ │ │ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/module │ │ .py:1501 in _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(*args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ /home/kai/workspace/stable-dreamfusion/venv/lib/python3.10/site-packages/torch/nn/modules/linear │ │ .py:114 in forward │ │ │ │ 111 │ │ │ init.uniform_(self.bias, -bound, bound) │ │ 112 │ │ │ 113 │ def forward(self, input: Tensor) -> Tensor: │ │ ❱ 114 │ │ return F.linear(input, self.weight, self.bias) │ │ 115 │ │ │ 116 │ def extra_repr(self) -> str: │ │ 117 │ │ return 'in_features={}, out_features={}, bias={}'.format( │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: CUDA error: invalid configuration argument CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. loss=nan (nan), lr=0.050000: : 60% 60/100 [00:11<00:07, 5.19it/s]
python main.py --text "A red dinosaur in boots." --workspace /var/lib/aigc/stable-dreamfusion/trial_dinosaur_iter30k -O --iters 30000
no crash.
Ubuntu 22.02 / PyTorch 2.0.1 / CUDA 11.7
try disable "--cuda_ray‘’, I solve this issue with it. I guess that it happened with enable "--cuda_ray‘’ and '--fp16' together,cause CUDA raymarching calculate tensor error, but pytorch is OK.
Description
Steps to Reproduce
Expected Behavior
no crash.
Environment
Ubuntu 22.02 / PyTorch 2.0.1 / CUDA 11.7