Closed vr-devil closed 1 year ago
Hi Kai! I had the same error before, with the last commit it should be fixed. Did you try pulling recently?
You can change the allocated mem in main.py at this line: parser.add_argument('--max_steps', type=int, default=512, help="max num steps sampled per ray (only valid when using --cuda_ray)")
Change "default' to what fits you better. I hope it helps.
@gianluigidalessandro thank you for help.
i am using the latest commit.
i changed --max_steps and --num_steps to 1, --w and --h to 16, OutOfMemoryError still happen.
PS C:\Workspaces\stable-dreamfusion> python main.py --text "a hamburger" --workspace trial -O
Namespace(text='a hamburger', negative='', O=True, O2=False, test=False, save_mesh=False, eval_interval=10, workspace='trial', guidance='stable-diffusion', seed=0, iters=10000, lr=0.001
, ckpt='latest', cuda_ray=True, max_steps=1, num_steps=1, upsample_steps=1, update_extra_interval=16, max_ray_batch=4096, albedo_iters=1000, uniform_sphere_rate=0.5, bg_radius=1.4, dens
ity_thresh=10, fp16=True, backbone='grid', w=16, h=16, jitter_pose=False, bound=1, dt_gamma=0, min_near=0.1, radius_range=[1.0, 1.5], fovy_range=[40, 70], dir_text=True, suppress_face=F
alse, angle_overhead=30, angle_front=60, lambda_entropy=0.0001, lambda_opacity=0, lambda_orient=0.01, lambda_smooth=0, gui=False, W=800, H=800, radius=3, fovy=60, light_theta=60, light_
phi=0, max_spp=1)
NeRFNetwork(
(encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 2048 per_level_scale=1.3819 params=(903480, 2) gridtype=tiled align_corners=False
(sigma_net): MLP(
(net): ModuleList(
(0): Linear(in_features=32, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=64, bias=True)
(2): Linear(in_features=64, out_features=4, bias=True)
)
)
(encoder_bg): FreqEncoder: input_dim=3 degree=6 output_dim=39
(bg_net): MLP(
(net): ModuleList(
(0): Linear(in_features=39, out_features=64, bias=True)
(1): Linear(in_features=64, out_features=3, bias=True)
)
)
)
[INFO] loaded hugging face access token from ./TOKEN!
[INFO] loading stable diffusion...
[INFO] loaded stable diffusion!
[INFO] Trainer: df | 2022-11-12_00-02-12 | cuda | fp16 | trial
[INFO] #parameters: 1816247
[INFO] Loading latest checkpoint ...
[WARN] No checkpoint found, model randomly initialized.
==> Start Training trial Epoch 1, lr=0.010000 ...
0% 0/100 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Workspaces\stable-dreamfusion\main.py:156 in <module> │
│ │
│ 153 │ │ │ valid_loader = NeRFDataset(opt, device=device, type='val', H=opt.H, W=opt.W, │
│ 154 │ │ │ │
│ 155 │ │ │ max_epoch = np.ceil(opt.iters / len(train_loader)).astype(np.int32) │
│ ❱ 156 │ │ │ trainer.train(train_loader, valid_loader, max_epoch) │
│ 157 │ │ │ │
│ 158 │ │ │ # also test │
│ 159 │ │ │ test_loader = NeRFDataset(opt, device=device, type='test', H=opt.H, W=opt.W, │
│ │
│ C:\Workspaces\stable-dreamfusion\nerf\utils.py:486 in train │
│ │
│ 483 │ │ for epoch in range(self.epoch + 1, max_epochs + 1): │
│ 484 │ │ │ self.epoch = epoch │
│ 485 │ │ │ │
│ ❱ 486 │ │ │ self.train_one_epoch(train_loader) │
│ 487 │ │ │ │
│ 488 │ │ │ if self.workspace is not None and self.local_rank == 0: │
│ 489 │ │ │ │ self.save_checkpoint(full=True, best=False) │
│ │
│ C:\Workspaces\stable-dreamfusion\nerf\utils.py:706 in train_one_epoch │
│ │
│ 703 │ │ │ self.optimizer.zero_grad() │
│ 704 │ │ │ │
│ 705 │ │ │ with torch.cuda.amp.autocast(enabled=self.fp16): │
│ ❱ 706 │ │ │ │ pred_rgbs, pred_ws, loss = self.train_step(data) │
│ 707 │ │ │ │
│ 708 │ │ │ self.scaler.scale(loss).backward() │
│ 709 │ │ │ self.scaler.step(self.optimizer) │
│ │
│ C:\Workspaces\stable-dreamfusion\nerf\utils.py:379 in train_step │
│ │
│ 376 │ │ │
│ 377 │ │ # encode pred_rgb to latents │
│ 378 │ │ # _t = time.time() │
│ ❱ 379 │ │ loss = self.guidance.train_step(text_z, pred_rgb) │
│ 380 │ │ # torch.cuda.synchronize(); print(f'[TIME] total guiding {time.time() - _t:.4f}s │
│ 381 │ │ │
│ 382 │ │ # occupancy loss │
│ │
│ C:\Workspaces\stable-dreamfusion\nerf\sd.py:87 in train_step │
│ │
│ 84 │ │ │
│ 85 │ │ # encode image into latents with vae, requires grad! │
│ 86 │ │ # _t = time.time() │
│ ❱ 87 │ │ latents = self.encode_imgs(pred_rgb_512) │
│ 88 │ │ # torch.cuda.synchronize(); print(f'[TIME] guiding: vae enc {time.time() - _t:.4 │
│ 89 │ │ │
│ 90 │ │ # predict the noise residual with unet, NO grad! │
│ │
│ C:\Workspaces\stable-dreamfusion\nerf\sd.py:161 in encode_imgs │
│ │
│ 158 │ │ │
│ 159 │ │ imgs = 2 * imgs - 1 │
│ 160 │ │ │
│ ❱ 161 │ │ posterior = self.vae.encode(imgs).latent_dist │
│ 162 │ │ latents = posterior.sample() * 0.18215 │
│ 163 │ │ │
│ 164 │ │ return latents │
│ │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\diffusers\models\vae.py:57 │
│ 0 in encode │
│ │
│ 567 │ │ self.post_quant_conv = torch.nn.Conv2d(latent_channels, latent_channels, 1) │
│ 568 │ │
│ 569 │ def encode(self, x: torch.FloatTensor, return_dict: bool = True) -> AutoencoderKLOut │
│ ❱ 570 │ │ h = self.encoder(x) │
│ 571 │ │ moments = self.quant_conv(h) │
│ 572 │ │ posterior = DiagonalGaussianDistribution(moments) │
│ 573 │
│ │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py │
│ :1190 in _call_impl │
│ │
│ 1187 │ │ # this function, and just call forward. │
│ 1188 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1189 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1190 │ │ │ return forward_call(*input, **kwargs) │
│ 1191 │ │ # Do not call functions when jit is used │
│ 1192 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1193 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\diffusers\models\vae.py:13 │
│ 4 in forward │
│ │
│ 131 │ │ │
│ 132 │ │ # down │
│ 133 │ │ for down_block in self.down_blocks: │
│ ❱ 134 │ │ │ sample = down_block(sample) │
│ 135 │ │ │
│ 136 │ │ # middle │
│ 137 │ │ sample = self.mid_block(sample) │
│ │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py │
│ :1190 in _call_impl │
│ │
│ 1187 │ │ # this function, and just call forward. │
│ 1188 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1189 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1190 │ │ │ return forward_call(*input, **kwargs) │
│ 1191 │ │ # Do not call functions when jit is used │
│ 1192 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1193 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\diffusers\models\unet_2d_b │
│ locks.py:741 in forward │
│ │
│ 738 │ │
│ 739 │ def forward(self, hidden_states): │
│ 740 │ │ for resnet in self.resnets: │
│ ❱ 741 │ │ │ hidden_states = resnet(hidden_states, temb=None) │
│ 742 │ │ │
│ 743 │ │ if self.downsamplers is not None: │
│ 744 │ │ │ for downsampler in self.downsamplers: │
│ │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py │
│ :1190 in _call_impl │
│ │
│ 1187 │ │ # this function, and just call forward. │
│ 1188 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1189 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1190 │ │ │ return forward_call(*input, **kwargs) │
│ 1191 │ │ # Do not call functions when jit is used │
│ 1192 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1193 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\diffusers\models\resnet.py │
│ :399 in forward │
│ │
│ 396 │ │ │ temb = self.time_emb_proj(self.nonlinearity(temb))[:, :, None, None] │
│ 397 │ │ │ hidden_states = hidden_states + temb │
│ 398 │ │ │
│ ❱ 399 │ │ hidden_states = self.norm2(hidden_states) │
│ 400 │ │ hidden_states = self.nonlinearity(hidden_states) │
│ 401 │ │ │
│ 402 │ │ hidden_states = self.dropout(hidden_states) │
│ │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py │
│ :1190 in _call_impl │
│ │
│ 1187 │ │ # this function, and just call forward. │
│ 1188 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1189 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1190 │ │ │ return forward_call(*input, **kwargs) │
│ 1191 │ │ # Do not call functions when jit is used │
│ 1192 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1193 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\normaliza │
│ tion.py:273 in forward │
│ │
│ 270 │ │ │ init.zeros_(self.bias) │
│ 271 │ │
│ 272 │ def forward(self, input: Tensor) -> Tensor: │
│ ❱ 273 │ │ return F.group_norm( │
│ 274 │ │ │ input, self.num_groups, self.weight, self.bias, self.eps) │
│ 275 │ │
│ 276 │ def extra_repr(self) -> str: │
│ │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\functional.py:252 │
│ 8 in group_norm │
│ │
│ 2525 │ if has_torch_function_variadic(input, weight, bias): │
│ 2526 │ │ return handle_torch_function(group_norm, (input, weight, bias,), input, num_grou │
│ 2527 │ _verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list( │
│ ❱ 2528 │ return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.e │
│ 2529 │
│ 2530 │
│ 2531 def local_response_norm(input: Tensor, size: int, alpha: float = 1e-4, beta: float = 0.7 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 5.28 GiB already allocated; 0 bytes free; 5.35 GiB reserved in total by PyTorch) If
reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0% 0/100 [00:03<?, ?it/s]
Did you try to change the 'default' parameter in main.py?
I would also check out this issue on stack in case the problem persists: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch
Did you try to change the 'default' parameter in main.py?
Yes, i changed default parameter in main.py derectly.
I would also check out this issue on stack in case the problem persists: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch
thanks. i have looked this question before, but didn't try it at the time, because will change the code. i will try it late, hope it can resolve the OutOfMemoryError problem.
Good luck ;)!
The author said. https://github.com/ashawkey/stable-dreamfusion/issues/41#issuecomment-1283545530
At least 12GB memory is required to run the model.
so sad, oh my god ! my poor GTX 1060.
Hello, my graphics card is running out of memory while running the model. Excuse me, what parameters can I adjust to avoid memory overflow?
Since I can't buy an RTX 4090 yet, I can only use a GTX 1060 from many years ago.
The memory overflow happened in Epoch 1, what a sad story. 😢
Thanks, if any suggestion.