GTX 1060 - 6 GiB GPU RAM - CUDA out of memory.

vr-devil commented 1 year ago

Hello, my graphics card is running out of memory while running the model. Excuse me, what parameters can I adjust to avoid memory overflow?

Since I can't buy an RTX 4090 yet, I can only use a GTX 1060 from many years ago.

OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 6.00 GiB total capacity; 5.28 GiB already allocated; 0 bytes free; 5.34 GiB reserved in total by PyTorch) If   
reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The memory overflow happened in Epoch 1, what a sad story. 😢

Thanks, if any suggestion.

gianluigidalessandro commented 1 year ago

Hi Kai! I had the same error before, with the last commit it should be fixed. Did you try pulling recently?

gianluigidalessandro commented 1 year ago

You can change the allocated mem in main.py at this line: parser.add_argument('--max_steps', type=int, default=512, help="max num steps sampled per ray (only valid when using --cuda_ray)")

Change "default' to what fits you better. I hope it helps.

vr-devil commented 1 year ago

@gianluigidalessandro thank you for help.

i am using the latest commit.

i changed --max_steps and --num_steps to 1, --w and --h to 16, OutOfMemoryError still happen.

PS C:\Workspaces\stable-dreamfusion> python main.py --text "a hamburger"  --workspace trial -O 
Namespace(text='a hamburger', negative='', O=True, O2=False, test=False, save_mesh=False, eval_interval=10, workspace='trial', guidance='stable-diffusion', seed=0, iters=10000, lr=0.001
, ckpt='latest', cuda_ray=True, max_steps=1, num_steps=1, upsample_steps=1, update_extra_interval=16, max_ray_batch=4096, albedo_iters=1000, uniform_sphere_rate=0.5, bg_radius=1.4, dens
ity_thresh=10, fp16=True, backbone='grid', w=16, h=16, jitter_pose=False, bound=1, dt_gamma=0, min_near=0.1, radius_range=[1.0, 1.5], fovy_range=[40, 70], dir_text=True, suppress_face=F
alse, angle_overhead=30, angle_front=60, lambda_entropy=0.0001, lambda_opacity=0, lambda_orient=0.01, lambda_smooth=0, gui=False, W=800, H=800, radius=3, fovy=60, light_theta=60, light_
phi=0, max_spp=1)
NeRFNetwork(
  (encoder): GridEncoder: input_dim=3 num_levels=16 level_dim=2 resolution=16 -> 2048 per_level_scale=1.3819 params=(903480, 2) gridtype=tiled align_corners=False
  (sigma_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=32, out_features=64, bias=True)
      (1): Linear(in_features=64, out_features=64, bias=True)
      (2): Linear(in_features=64, out_features=4, bias=True)
    )
  )
  (encoder_bg): FreqEncoder: input_dim=3 degree=6 output_dim=39
  (bg_net): MLP(
    (net): ModuleList(
      (0): Linear(in_features=39, out_features=64, bias=True)
      (1): Linear(in_features=64, out_features=3, bias=True)
    )
  )
)
[INFO] loaded hugging face access token from ./TOKEN!
[INFO] loading stable diffusion...
[INFO] loaded stable diffusion!
[INFO] Trainer: df | 2022-11-12_00-02-12 | cuda | fp16 | trial
[INFO] #parameters: 1816247
[INFO] Loading latest checkpoint ...
[WARN] No checkpoint found, model randomly initialized.
==> Start Training trial Epoch 1, lr=0.010000 ...
  0% 0/100 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Workspaces\stable-dreamfusion\main.py:156 in <module>                                         │
│                                                                                                  │
│   153 │   │   │   valid_loader = NeRFDataset(opt, device=device, type='val', H=opt.H, W=opt.W,   │
│   154 │   │   │                                                                                  │
│   155 │   │   │   max_epoch = np.ceil(opt.iters / len(train_loader)).astype(np.int32)            │
│ ❱ 156 │   │   │   trainer.train(train_loader, valid_loader, max_epoch)                           │
│   157 │   │   │                                                                                  │
│   158 │   │   │   # also test                                                                    │
│   159 │   │   │   test_loader = NeRFDataset(opt, device=device, type='test', H=opt.H, W=opt.W,   │
│                                                                                                  │
│ C:\Workspaces\stable-dreamfusion\nerf\utils.py:486 in train                                      │
│                                                                                                  │
│   483 │   │   for epoch in range(self.epoch + 1, max_epochs + 1):                                │
│   484 │   │   │   self.epoch = epoch                                                             │
│   485 │   │   │                                                                                  │
│ ❱ 486 │   │   │   self.train_one_epoch(train_loader)                                             │
│   487 │   │   │                                                                                  │
│   488 │   │   │   if self.workspace is not None and self.local_rank == 0:                        │
│   489 │   │   │   │   self.save_checkpoint(full=True, best=False)                                │
│                                                                                                  │
│ C:\Workspaces\stable-dreamfusion\nerf\utils.py:706 in train_one_epoch                            │
│                                                                                                  │
│   703 │   │   │   self.optimizer.zero_grad()                                                     │
│   704 │   │   │                                                                                  │
│   705 │   │   │   with torch.cuda.amp.autocast(enabled=self.fp16):                               │
│ ❱ 706 │   │   │   │   pred_rgbs, pred_ws, loss = self.train_step(data)                           │
│   707 │   │   │                                                                                  │
│   708 │   │   │   self.scaler.scale(loss).backward()                                             │
│   709 │   │   │   self.scaler.step(self.optimizer)                                               │
│                                                                                                  │
│ C:\Workspaces\stable-dreamfusion\nerf\utils.py:379 in train_step                                 │
│                                                                                                  │
│   376 │   │                                                                                      │
│   377 │   │   # encode pred_rgb to latents                                                       │
│   378 │   │   # _t = time.time()                                                                 │
│ ❱ 379 │   │   loss = self.guidance.train_step(text_z, pred_rgb)                                  │
│   380 │   │   # torch.cuda.synchronize(); print(f'[TIME] total guiding {time.time() - _t:.4f}s   │
│   381 │   │                                                                                      │
│   382 │   │   # occupancy loss                                                                   │
│                                                                                                  │
│ C:\Workspaces\stable-dreamfusion\nerf\sd.py:87 in train_step                                     │
│                                                                                                  │
│    84 │   │                                                                                      │
│    85 │   │   # encode image into latents with vae, requires grad!                               │
│    86 │   │   # _t = time.time()                                                                 │
│ ❱  87 │   │   latents = self.encode_imgs(pred_rgb_512)                                           │
│    88 │   │   # torch.cuda.synchronize(); print(f'[TIME] guiding: vae enc {time.time() - _t:.4   │
│    89 │   │                                                                                      │
│    90 │   │   # predict the noise residual with unet, NO grad!                                   │
│                                                                                                  │
│ C:\Workspaces\stable-dreamfusion\nerf\sd.py:161 in encode_imgs                                   │
│                                                                                                  │
│   158 │   │                                                                                      │
│   159 │   │   imgs = 2 * imgs - 1                                                                │
│   160 │   │                                                                                      │
│ ❱ 161 │   │   posterior = self.vae.encode(imgs).latent_dist                                      │
│   162 │   │   latents = posterior.sample() * 0.18215                                             │
│   163 │   │                                                                                      │
│   164 │   │   return latents                                                                     │
│                                                                                                  │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\diffusers\models\vae.py:57 │
│ 0 in encode                                                                                      │
│                                                                                                  │
│   567 │   │   self.post_quant_conv = torch.nn.Conv2d(latent_channels, latent_channels, 1)        │
│   568 │                                                                                          │
│   569 │   def encode(self, x: torch.FloatTensor, return_dict: bool = True) -> AutoencoderKLOut   │
│ ❱ 570 │   │   h = self.encoder(x)                                                                │
│   571 │   │   moments = self.quant_conv(h)                                                       │
│   572 │   │   posterior = DiagonalGaussianDistribution(moments)                                  │
│   573                                                                                            │
│                                                                                                  │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py │
│ :1190 in _call_impl                                                                              │
│                                                                                                  │
│   1187 │   │   # this function, and just call forward.                                           │
│   1188 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1189 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1190 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1191 │   │   # Do not call functions when jit is used                                          │
│   1192 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1193 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\diffusers\models\vae.py:13 │
│ 4 in forward                                                                                     │
│                                                                                                  │
│   131 │   │                                                                                      │
│   132 │   │   # down                                                                             │
│   133 │   │   for down_block in self.down_blocks:                                                │
│ ❱ 134 │   │   │   sample = down_block(sample)                                                    │
│   135 │   │                                                                                      │
│   136 │   │   # middle                                                                           │
│   137 │   │   sample = self.mid_block(sample)                                                    │
│                                                                                                  │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py │
│ :1190 in _call_impl                                                                              │
│                                                                                                  │
│   1187 │   │   # this function, and just call forward.                                           │
│   1188 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1189 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1190 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1191 │   │   # Do not call functions when jit is used                                          │
│   1192 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1193 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\diffusers\models\unet_2d_b │
│ locks.py:741 in forward                                                                          │
│                                                                                                  │
│    738 │                                                                                         │
│    739 │   def forward(self, hidden_states):                                                     │
│    740 │   │   for resnet in self.resnets:                                                       │
│ ❱  741 │   │   │   hidden_states = resnet(hidden_states, temb=None)                              │
│    742 │   │                                                                                     │
│    743 │   │   if self.downsamplers is not None:                                                 │
│    744 │   │   │   for downsampler in self.downsamplers:                                         │
│                                                                                                  │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py │
│ :1190 in _call_impl                                                                              │
│                                                                                                  │
│   1187 │   │   # this function, and just call forward.                                           │
│   1188 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1189 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1190 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1191 │   │   # Do not call functions when jit is used                                          │
│   1192 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1193 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\diffusers\models\resnet.py │
│ :399 in forward                                                                                  │
│                                                                                                  │
│   396 │   │   │   temb = self.time_emb_proj(self.nonlinearity(temb))[:, :, None, None]           │
│   397 │   │   │   hidden_states = hidden_states + temb                                           │
│   398 │   │                                                                                      │
│ ❱ 399 │   │   hidden_states = self.norm2(hidden_states)                                          │
│   400 │   │   hidden_states = self.nonlinearity(hidden_states)                                   │
│   401 │   │                                                                                      │
│   402 │   │   hidden_states = self.dropout(hidden_states)                                        │
│                                                                                                  │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py │
│ :1190 in _call_impl                                                                              │
│                                                                                                  │
│   1187 │   │   # this function, and just call forward.                                           │
│   1188 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1189 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1190 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1191 │   │   # Do not call functions when jit is used                                          │
│   1192 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1193 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\normaliza │
│ tion.py:273 in forward                                                                           │
│                                                                                                  │
│   270 │   │   │   init.zeros_(self.bias)                                                         │
│   271 │                                                                                          │
│   272 │   def forward(self, input: Tensor) -> Tensor:                                            │
│ ❱ 273 │   │   return F.group_norm(                                                               │
│   274 │   │   │   input, self.num_groups, self.weight, self.bias, self.eps)                      │
│   275 │                                                                                          │
│   276 │   def extra_repr(self) -> str:                                                           │
│                                                                                                  │
│ C:\Users\Kai\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\functional.py:252 │
│ 8 in group_norm                                                                                  │
│                                                                                                  │
│   2525 │   if has_torch_function_variadic(input, weight, bias):                                  │
│   2526 │   │   return handle_torch_function(group_norm, (input, weight, bias,), input, num_grou  │
│   2527 │   _verify_batch_size([input.size(0) * input.size(1) // num_groups, num_groups] + list(  │
│ ❱ 2528 │   return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.e  │
│   2529                                                                                           │
│   2530                                                                                           │
│   2531 def local_response_norm(input: Tensor, size: int, alpha: float = 1e-4, beta: float = 0.7  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 6.00 GiB total capacity; 5.28 GiB already allocated; 0 bytes free; 5.35 GiB reserved in total by PyTorch) If  
reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  0% 0/100 [00:03<?, ?it/s]

gianluigidalessandro commented 1 year ago

Did you try to change the 'default' parameter in main.py?

I would also check out this issue on stack in case the problem persists: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch

vr-devil commented 1 year ago

Did you try to change the 'default' parameter in main.py?

Yes, i changed default parameter in main.py derectly.

I would also check out this issue on stack in case the problem persists: https://stackoverflow.com/questions/59129812/how-to-avoid-cuda-out-of-memory-in-pytorch

thanks. i have looked this question before, but didn't try it at the time, because will change the code. i will try it late, hope it can resolve the OutOfMemoryError problem.

gianluigidalessandro commented 1 year ago

Good luck ;)!

vr-devil commented 1 year ago

The author said. https://github.com/ashawkey/stable-dreamfusion/issues/41#issuecomment-1283545530

At least 12GB memory is required to run the model.

so sad, oh my god ! my poor GTX 1060.

ashawkey / stable-dreamfusion

GTX 1060 - 6 GiB GPU RAM - CUDA out of memory. #82