ashawkey / RAD-NeRF

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
MIT License
898 stars 154 forks source link

Killed when I want to use --preload 2 #62

Open zhang010930 opened 1 year ago

zhang010930 commented 1 year ago

I saw in the document that you can use --preload 2 to make the training faster, but when I run it with a server with 3090 and 80G memory, it will report an error. I saw in the document that 24G memory is enough. Why is this?

root@autodl-container-bde111aa08-2431f782:~/autodl-tmp/RAD-NeRF/RAD-NeRF-main# python main.py data/nv2zhongwen/ --workspace trial_nv2zhongwen/ -O --iters 200000 --preload 2 Namespace(H=450, O=True, W=450, amb_dim=2, asr=False, asr_model='cpierse/wav2vec2-large-xlsr-53-esperanto', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='', bg_img='', bound=1, ckpt='latest', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=False, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=10000, iters=200000, l=10, lambda_amb=0.1, lr=0.005, lr_net=0.0005, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, path='data/nv2zhongwen/', preload=2, r=10, radius=3.35, scale=4, seed=0, smooth_eye=False, smooth_lips=False, smooth_path=False, smooth_path_window=7, test=False, test_train=False, torso=False, torso_shrink=0.8, train_camera=False, update_extra_interval=16, upsample_steps=0, workspace='trial_nv2zhongwen/') [INFO] load 6821 train frames. [INFO] load aud_features: torch.Size([7504, 44, 16]) Loading train data: 100%|████████████████████████████████████████████████████████████████████████████| 6821/6821 [01:21<00:00, 84.20it/s] Killed root@autodl-container-bde111aa08-2431f782:~/autodl-tmp/RAD-NeRF/RAD-NeRF-main# python main.py data/nv2zhongwen/ --workspace trial_nv2zhongwen/ -O --iters 200000 --preload 1 Namespace(H=450, O=True, W=450, amb_dim=2, asr=False, asr_model='cpierse/wav2vec2-large-xlsr-53-esperanto', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='', bg_img='', bound=1, ckpt='latest', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=False, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=10000, iters=200000, l=10, lambda_amb=0.1, lr=0.005, lr_net=0.0005, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, path='data/nv2zhongwen/', preload=1, r=10, radius=3.35, scale=4, seed=0, smooth_eye=False, smooth_lips=False, smooth_path=False, smooth_path_window=7, test=False, test_train=False, torso=False, torso_shrink=0.8, train_camera=False, update_extra_interval=16, upsample_steps=0, workspace='trial_nv2zhongwen/') [INFO] load 6821 train frames. [INFO] load aud_features: torch.Size([7504, 44, 16]) Loading train data: 100%|████████████████████████████████████████████████████████████████████████████| 6821/6821 [01:26<00:00, 78.93it/s] Killed

image

Marvinified commented 1 year ago

Experiencing the same issue when I use --preload 2.

What happens is when loading the dataset, the RAM usage keeps increasing, eventually gets maxed out and gets killed.

Been stuck on this a few hours now.

Marvinified commented 1 year ago

After further investigation, I found the following lines of code to be the culprit

https://github.com/ashawkey/RAD-NeRF/blob/0de5ed259592592294677ad6cf7605f478a0de57/nerf/provider.py#L530-L532

Due to np.stack allocating memory to the new array as well as the existing memory allocation of the existing images, this was maxing my 80GB RAM out once it get to Line 532.

See more details in the answer to this Stackoverflow question https://stackoverflow.com/questions/31268998/how-to-merge-two-large-numpy-arrays-if-slicing-doesnt-resolve-memory-error

Solution was I had to rewrite NerfDataset to use the technique in the answer.

Important Snippet to give an idea, I did this for self.torso_images and self.images

          # TODO: dynamicaly determine shape last dim, it can be 4 or 3
          self.images = np.empty((len(frames), self.H, self.W, 3), dtype=np.float32) # [N, H, W, C]
          index = 0
          for f in tqdm.tqdm(frames, desc=f'Preloading images {type}  data '):
              f_path = os.path.join(self.root_path, 'gt_imgs', str(f['img_id']) + '.jpg')
              image = cv2.imread(f_path, cv2.IMREAD_UNCHANGED) # [H, W, 3] o [H, W, 4]
              image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
              image = image.astype(np.float32) / 255 # [H, W, 3/4]
              self.images[index] = image
              index += 1

          self.images = torch.from_numpy(self.images) # [N, H, W, C]
          if self.preload > 1:
              self.images = self.images.to(torch.half).to(self.device)

Context I was training on a video ~5mins+ long on a A100 80G, with 12 CPU cores and 80Gb of RAM to test.

PS: I will try to make time to probably make a PR for this when I am chanced. cc: @ashawkey

wudidecc commented 10 months ago

After further investigation, I found the following lines of code to be the culprit

https://github.com/ashawkey/RAD-NeRF/blob/0de5ed259592592294677ad6cf7605f478a0de57/nerf/provider.py#L530-L532

Due to allocating memory to the new array as well as the existing memory allocation of the existing images, this was maxing my 80GB RAM out once it get to Line 532.np.stack

See more details in the answer to this Stackoverflow question https://stackoverflow.com/questions/31268998/how-to-merge-two-large-numpy-arrays-if-slicing-doesnt-resolve-memory-error

Solution was I had to rewrite to use the technique in the answer.NerfDataset

Important Snippet to give an idea, I did this for and self.torso_images``self.images

          # TODO: dynamicaly determine shape last dim, it can be 4 or 3
          self.images = np.empty((len(frames), self.H, self.W, 3), dtype=np.float32) # [N, H, W, C]
          index = 0
          for f in tqdm.tqdm(frames, desc=f'Preloading images {type}  data '):
              f_path = os.path.join(self.root_path, 'gt_imgs', str(f['img_id']) + '.jpg')
              image = cv2.imread(f_path, cv2.IMREAD_UNCHANGED) # [H, W, 3] o [H, W, 4]
              image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
              image = image.astype(np.float32) / 255 # [H, W, 3/4]
              self.images[index] = image
              index += 1

          self.images = torch.from_numpy(self.images) # [N, H, W, C]
          if self.preload > 1:
              self.images = self.images.to(torch.half).to(self.device)

Context I was training on a video ~5mins+ long on a A100 80G, with 12 CPU cores and 80Gb of RAM to test.

PS: I will try to make time to probably make a PR for this when I am chanced. cc: @ashawkey

May I ask if you could tell me the specific number of lines you have modified? Alternatively, could you please send NerfDataset to my email. My email number is 907551572@qq.com . Your help is very important to me. I hope to receive your help.