ashawkey / RAD-NeRF

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
MIT License
878 stars 153 forks source link

Given input size: (192x2x2). Calculated output size: (192x0x0). Output size is too small #25

Open aishoot opened 1 year ago

aishoot commented 1 year ago

Hello, thanks for your nice work. When I run the code on my video, there is a problem appearing.

~/MyCode/RAD_NeRF$ python main.py data/person_video1_25fps_512x512/ --workspace person_video1/ -O --iters 250000 --finetune_lips
Namespace(H=450, O=True, W=450, amb_dim=2, asr=False, asr_model='cpierse/wav2vec2-large-xlsr-53-esperanto', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='', bg_img='', bound=1, ckpt='latest', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=True, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=10000, iters=250000, l=10, lambda_amb=0.1, lr=0.005, lr_net=0.0005, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, path='data/person_video1_25fps_512x512/', preload=0, r=10, radius=3.35, scale=4, seed=0, smooth_eye=False, smooth_lips=False, smooth_path=False, smooth_path_window=7, test=False, test_train=False, torso=False, torso_shrink=0.8, train_camera=False, update_extra_interval=1000000000.0, upsample_steps=0, workspace='person_video1/')
[INFO] load 783 train frames.
[INFO] load  aud_features: torch.Size([861, 44, 16])
Loading train data: 100%|███████████████████████████████████████████████| 783/783 [00:00<00:00, 2328.66it/s]
[INFO] eye_area: 0.02593994140625 - 0.06561279296875
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/anaconda3/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/anaconda3/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /home/anaconda3/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /home/anaconda3/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
[INFO] Trainer: ngp | 2023-02-23_12-56-18 | cuda | fp16 | person_video1/
[INFO] #parameters: 3024277
[INFO] Loading latest checkpoint ...
[INFO] Latest checkpoint is person_video1/checkpoints/ngp_ep0256.pth
[INFO] loaded model.
[INFO] load at epoch 256, global step 200448
[INFO] loaded optimizer.
[INFO] loaded scheduler.
[INFO] loaded scaler.
[INFO] load 79 val frames.
[INFO] load  aud_features: torch.Size([861, 44, 16])
Loading val data: 100%|███████████████████████████████████████████████████| 79/79 [00:00<00:00, 2280.58it/s]
[INFO] eye_area: 0.0255584716796875 - 0.0614166259765625
[INFO] max_epoch = 320
==> Start Training Epoch 257, lr=0.000050 ...
loss=0.0001 (0.0004), lr=0.000045: :   0% 1/783 [00:01<13:32,  1.04s/it]Traceback (most recent call last):
  File "main.py", line 253, in <module>
    trainer.train(train_loader, valid_loader, max_epoch)
  File "/home/MyCode/ashawkeyRAD_NeRF/nerf/utils.py", line 906, in train
    self.train_one_epoch(train_loader)
  File "/home/MyCode/ashawkeyRAD_NeRF/nerf/utils.py", line 1169, in train_one_epoch
    preds, truths, loss = self.train_step(data)
  File "/home/MyCode/ashawkeyRAD_NeRF/nerf/utils.py", line 766, in train_step
    loss = loss + 0.01 * self.criterion_lpips(pred_rgb, rgb)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/lpips/lpips.py", line 119, in forward
    outs0, outs1 = self.net.forward(in0_input), self.net.forward(in1_input)
  File "/home/anaconda3/lib/python3.8/site-packages/lpips/pretrained_networks.py", line 85, in forward
    h = self.slice3(h)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 166, in forward
    return F.max_pool2d(input, self.kernel_size, self.stride,
  File "/home/anaconda3/lib/python3.8/site-packages/torch/_jit_internal.py", line 485, in fn
    return if_false(*args, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 782, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

RuntimeError: Given input size: (192x2x2). Calculated output size: (192x0x0). Output size is too small
loss=0.0001 (0.0004), lr=0.000045: :   0% 2/783 [00:01<07:52,  1.65it/s]
afan249 commented 1 year ago

请问 这个问题解决了吗

aishoot commented 1 year ago

请问 这个问题解决了吗

Solved it

afan249 commented 1 year ago

怎么解决的呢,我这边也遇到了同样的问题

OceanTan commented 1 year ago

请问 这个问题解决了吗

解决了

你好,可以分享下你的解决办法吗?我也出现这个错误了。我的视频是512*512像素

chanchanalina commented 1 year ago

Hello, can you share your solution?Thank you very much

OceanTan commented 1 year ago

Hello, can you share your solution?Thank you very much

Your video is parsing, so you can look at the file face_parsing/test.py

OceanTan commented 1 year ago

Hello, can you share your solution?Thank you very much

https://github.com/ashawkey/RAD-NeRF/issues/46

ahkimkoo commented 7 months ago

Hello, can you share your solution?Thank you very much

Your video is parsing, so you can look at the file face_parsing/test.py

Sorry, what is the purpose of looking at this source code? Do you know how to solve this problem?

phoeenniixx commented 1 month ago

Hello, thanks for your nice work. When I run the code on my video, there is a problem appearing.

~/MyCode/RAD_NeRF$ python main.py data/person_video1_25fps_512x512/ --workspace person_video1/ -O --iters 250000 --finetune_lips
Namespace(H=450, O=True, W=450, amb_dim=2, asr=False, asr_model='cpierse/wav2vec2-large-xlsr-53-esperanto', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='', bg_img='', bound=1, ckpt='latest', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=True, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=10000, iters=250000, l=10, lambda_amb=0.1, lr=0.005, lr_net=0.0005, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, path='data/person_video1_25fps_512x512/', preload=0, r=10, radius=3.35, scale=4, seed=0, smooth_eye=False, smooth_lips=False, smooth_path=False, smooth_path_window=7, test=False, test_train=False, torso=False, torso_shrink=0.8, train_camera=False, update_extra_interval=1000000000.0, upsample_steps=0, workspace='person_video1/')
[INFO] load 783 train frames.
[INFO] load  aud_features: torch.Size([861, 44, 16])
Loading train data: 100%|███████████████████████████████████████████████| 783/783 [00:00<00:00, 2328.66it/s]
[INFO] eye_area: 0.02593994140625 - 0.06561279296875
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/home/anaconda3/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/anaconda3/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /home/anaconda3/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /home/anaconda3/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth
[INFO] Trainer: ngp | 2023-02-23_12-56-18 | cuda | fp16 | person_video1/
[INFO] #parameters: 3024277
[INFO] Loading latest checkpoint ...
[INFO] Latest checkpoint is person_video1/checkpoints/ngp_ep0256.pth
[INFO] loaded model.
[INFO] load at epoch 256, global step 200448
[INFO] loaded optimizer.
[INFO] loaded scheduler.
[INFO] loaded scaler.
[INFO] load 79 val frames.
[INFO] load  aud_features: torch.Size([861, 44, 16])
Loading val data: 100%|███████████████████████████████████████████████████| 79/79 [00:00<00:00, 2280.58it/s]
[INFO] eye_area: 0.0255584716796875 - 0.0614166259765625
[INFO] max_epoch = 320
==> Start Training Epoch 257, lr=0.000050 ...
loss=0.0001 (0.0004), lr=0.000045: :   0% 1/783 [00:01<13:32,  1.04s/it]Traceback (most recent call last):
  File "main.py", line 253, in <module>
    trainer.train(train_loader, valid_loader, max_epoch)
  File "/home/MyCode/ashawkeyRAD_NeRF/nerf/utils.py", line 906, in train
    self.train_one_epoch(train_loader)
  File "/home/MyCode/ashawkeyRAD_NeRF/nerf/utils.py", line 1169, in train_one_epoch
    preds, truths, loss = self.train_step(data)
  File "/home/MyCode/ashawkeyRAD_NeRF/nerf/utils.py", line 766, in train_step
    loss = loss + 0.01 * self.criterion_lpips(pred_rgb, rgb)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/lpips/lpips.py", line 119, in forward
    outs0, outs1 = self.net.forward(in0_input), self.net.forward(in1_input)
  File "/home/anaconda3/lib/python3.8/site-packages/lpips/pretrained_networks.py", line 85, in forward
    h = self.slice3(h)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/modules/pooling.py", line 166, in forward
    return F.max_pool2d(input, self.kernel_size, self.stride,
  File "/home/anaconda3/lib/python3.8/site-packages/torch/_jit_internal.py", line 485, in fn
    return if_false(*args, **kwargs)
  File "/home/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 782, in _max_pool2d
    return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

RuntimeError: Given input size: (192x2x2). Calculated output size: (192x0x0). Output size is too small
loss=0.0001 (0.0004), lr=0.000045: :   0% 2/783 [00:01<07:52,  1.65it/s]

facing the same issue can you please help me @aishoot, will really appreciate the help thank you