ashawkey / RAD-NeRF

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
MIT License
862 stars 151 forks source link

The result mp4 video does not have sound #95

Open Jeriousman opened 3 months ago

Jeriousman commented 3 months ago

I tried a command line down below:

python test.py --pose data/obama/obama.json --ckpt pretrained/obama_eo.pth --aud data/obama/trump_eo.npy --workspace trial_obama/ -O --torso

Namespace(H=450, O=True, W=450, amb_dim=2, asr=False, asr_model='cpierse/wav2vec2-large-xlsr-53-esperanto', asr_play=False, asr_save_feats=False, asr_wav='', att=2, aud='data/obama/trump_eo.npy', bg_img='white', bound=1, ckpt='pretrained/obama_eo.pth', color_space='srgb', cuda_ray=True, data_range=[0, -1], density_thresh=10, density_thresh_torso=0.01, dt_gamma=0.00390625, emb=False, exp_eye=True, fbg=False, finetune_lips=False, fix_eye=-1, fovy=21.24, fp16=True, fps=50, gui=False, head_ckpt='', ind_dim=4, ind_dim_torso=8, ind_num=10000, l=10, lambda_amb=0.1, m=50, max_ray_batch=4096, max_spp=1, max_steps=16, min_near=0.05, num_rays=65536, num_steps=16, offset=[0, 0, 0], part=False, part2=False, patch_size=1, pose='data/obama/obama.json', r=10, radius=3.35, scale=4, seed=0, smooth_eye=True, smooth_lips=True, smooth_path=True, smooth_path_window=7, test=True, test_train=False, torso=True, torso_shrink=0.8, train_camera=False, update_extra_interval=16, upsample_steps=0, workspace='trial_obama/')
[INFO] Trainer: ngp | 2024-04-16_17-17-44 | cuda | fp16 | trial_obama/
[INFO] #parameters: 4231701
[INFO] Loading pretrained/obama_eo.pth ...
[INFO] loaded model.
[WARN] missing keys: ['density_grid']
[INFO] load at epoch 28, global step 203616
[INFO] load 7272 frames.
[INFO] load data/obama/trump_eo.npy aud_features: torch.Size([759, 44, 16])
Loading data: 100%|██████████████████████████████████████████████████████████████████████████████████| 7272/7272 [00:00<00:00, 47451.07it/s]
[INFO] eye_area: 0.25 - 0.25
==> Start Test, save results to trial_obama/results
100% 757/759 [00:23<00:00, 36.95it/s][swscaler @ 0x6561f00] Warning: data is not aligned! This can lead to a speed loss
==> Finished Test.
100% 759/759 [00:27<00:00, 28.02it/s]

And then the resulting video does not contain sound. Why is it so? Is it only me or other people are experiencing the same situation? If solved, could anyone share how to solve the problem? The trump.wav file that outputted trump_eo.npy does have sound.

shivankar-p commented 1 month ago

Did you manage to fix this? I am having the same issue