Open angelandy opened 1 year ago
+1,my message
python main.py data/obama/ --workspace /train/trial_obama/ -O --iters 100000
Namespace(path='data/obama/', O=True, test=False, test_train=False, data_range=[0, -1], workspace='/train/trial_obama/', seed=0, iters=100000, lr=0.01, lr_net=0.001, ckpt='latest', num_rays=65536, cuda_ray=True, max_steps=16, num_steps=16, upsample_steps=0, update_extra_interval=16, max_ray_batch=4096, warmup_step=10000, amb_aud_loss=1, amb_eye_loss=1, unc_loss=1, lambda_amb=0.0001, fp16=True, bg_img='', fbg=False, exp_eye=True, fix_eye=-1, smooth_eye=False, torso_shrink=0.8, color_space='srgb', preload=0, bound=1, scale=4, offset=[0, 0, 0], dt_gamma=0.00390625, min_near=0.05, density_thresh=10, density_thresh_torso=0.01, patch_size=1, init_lips=False, finetune_lips=False, smooth_lips=False, torso=False, head_ckpt='', gui=False, W=450, H=450, radius=3.35, fovy=21.24, max_spp=1, att=2, aud='', emb=False, ind_dim=4, ind_num=10000, ind_dim_torso=8, amb_dim=2, part=False, part2=False, train_camera=False, smooth_path=False, smooth_path_window=7, asr=False, asr_wav='', asr_play=False, asr_model='yy', asr_save_feats=False, fps=50, l=10, m=50, r=10)
[INFO] load 7272 train frames.
[INFO] load aud_features: torch.Size([7999, 29, 16])
Loading train data: 100%|██████████████████████████████████████| 7272/7272 [00:00<00:00, 9923.96it/s]
[INFO] eye_area: 0.0 - 1.0
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/root/anaconda3/envs/geneface/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/root/anaconda3/envs/geneface/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None
for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=AlexNet_Weights.IMAGENET1K_V1
. You can also use weights=AlexNet_Weights.DEFAULT
to get the most up-to-date weights.
warnings.warn(msg)
Loading model from: /root/anaconda3/envs/geneface/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
Loading model from: /root/anaconda3/envs/geneface/lib/python3.9/site-packages/lpips/weights/v0.1/alex.pth
[INFO] Trainer: ngp | 2023-09-02_05-36-15 | cuda | fp16 | /train/trial_obama/
[INFO] #parameters: 588277
[INFO] Loading latest checkpoint ...
[WARN] No checkpoint found, model randomly initialized.
[INFO] load 100 val frames.
[INFO] load aud_features: torch.Size([7999, 29, 16])
Loading val data: 100%|█████████████████████████████████████████| 100/100 [00:00<00:00, 10408.48it/s]
[INFO] eye_area: 0.0 - 0.8050000071525574
[INFO] max_epoch = 14
==> Start Training Epoch 1, lr=0.001000 ...
0% 0/7272 [00:00<?, ?it/s]Traceback (most recent call last):
File "/www/wwwroot/AI/geneface/ER-NeRF-main/main.py", line 248, in
I think it is because you use different audio feature extraction methods. HuBert would give 1024 dim feature while DeepSpeach gives something like [x, 29, 16].
hi,thank you for your project here is my error, i do not know which step is wrong
here is My command: python main.py data/obama/ --workspace trial_obama/ -O --test --test_train --aud data/1_hu.npy
and i got the message here:
root@d8e5bdfb3898:/data/ER-NeRF-main# python main.py data/obama/ --workspace trial_obama/ -O --test --test_train --aud data/1_hu.npy Namespace(H=450, O=True, W=450, amb_aud_loss=1, amb_dim=2, amb_eye_loss=1, asr=False, asr_model='deepspeech', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='data/1_hu.npy', bg_img='', bound=1, ckpt='latest', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=False, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=10000, init_lips=False, iters=200000, l=10, lambda_amb=0.0001, lr=0.01, lr_net=0.001, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, path='data/obama/', preload=0, r=10, radius=3.35, scale=4, seed=0, smooth_eye=False, smooth_lips=False, smooth_path=False, smooth_path_window=7, test=True, test_train=True, torso=False, torso_shrink=0.8, train_camera=False, unc_loss=1, update_extra_interval=16, upsample_steps=0, warmup_step=10000, workspace='trial_obama/') Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off] /opt/conda/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead. warnings.warn( /opt/conda/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. warnings.warn(msg) Loading model from: /opt/conda/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off] /opt/conda/lib/python3.8/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead. warnings.warn( /opt/conda/lib/python3.8/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights. warnings.warn(msg) Loading model from: /opt/conda/lib/python3.8/site-packages/lpips/weights/v0.1/alex.pth [INFO] Trainer: ngp | 2023-08-25_10-10-16 | cuda | fp16 | trial_obama/ [INFO] #parameters: 587989 [INFO] Loading latest checkpoint ... [WARN] No checkpoint found, model randomly initialized. [INFO] load 7272 train frames. [INFO] load data/1_hu.npy aud_features: torch.Size([371, 1024, 2]) Loading train data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7272/7272 [00:01<00:00, 6587.96it/s] [INFO] eye_area: 0.0 - 1.0 ==> Start Test, save results to trial_obama/results 0% 0/371 [00:00<?, ?it/s]Traceback (most recent call last): File "main.py", line 206, in <module> trainer.test(test_loader) File "/data/ER-NeRF-main/nerf_triplane/utils.py", line 1023, in test preds, preds_depth = self.test_step(data) File "/data/ER-NeRF-main/nerf_triplane/utils.py", line 939, in test_step outputs = self.model.render(rays_o, rays_d, auds, bg_coords, poses, eye=eye, index=index, staged=True, bg_color=bg_color, perturb=perturb, **vars(self.opt)) File "/data/ER-NeRF-main/nerf_triplane/renderer.py", line 675, in render results = _run(rays_o, rays_d, auds, bg_coords, poses, **kwargs) File "/data/ER-NeRF-main/nerf_triplane/renderer.py", line 188, in run_cuda enc_a = self.encode_audio(auds) # [1, 64] File "/data/ER-NeRF-main/nerf_triplane/network.py", line 232, in encode_audio enc_a = self.audio_net(a) # [1/8, 64] File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1131, in _call_impl return forward_call(*input, **kwargs) File "/data/ER-NeRF-main/nerf_triplane/network.py", line 64, in forward x = self.encoder_conv(x).squeeze(-1) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1131, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1131, in _call_impl return forward_call(*input, **kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 309, in forward return self._conv_forward(input, self.weight, self.bias) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 305, in _conv_forward return F.conv1d(input, weight, bias, self.stride, RuntimeError: Given groups=1, weight of size [32, 29, 3], expected input[8, 1024, 2] to have 29 channels, but got 1024 channels instead 0% 0/371 [00:00<?, ?it/s]
Can you solve this problem?
Hi,can you solve this problem?
It feels like that you miss the argument “--asr_model hubert”
The pretrained model is trained based on the deepspeech
audio extractor ,that's why is 29 channels ,but the hubert
is required 1024 channels. If you want to solve this question,maybe you have to train the model from scratch
Run process.py with --asr_model hubert and Training again.
hi,thank you for your project here is my error, i do not know which step is wrong
here is My command: python main.py data/obama/ --workspace trial_obama/ -O --test --test_train --aud data/1_hu.npy
and i got the message here: