Fictionarry / ER-NeRF

[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
https://fictionarry.github.io/ER-NeRF/
MIT License
894 stars 124 forks source link

视频预处理中extract_audio_features方法报错 #91

Open liwu96 opened 7 months ago

liwu96 commented 7 months ago

请教下 我在执行python data_utils/process.py data//.mp4方法时遇到报错 Traceback (most recent call last): File "/home/work/ER-NeRF/data_utils/deepspeech_features/extract_ds_features.py", line 131, in main() File "/home/work/ER-NeRF/data_utils/deepspeech_features/extract_ds_features.py", line 107, in main extract_features( File "/home/work/ER-NeRF/data_utils/deepspeech_features/extract_ds_features.py", line 80, in extract_features conv_audios_to_deepspeech( File "/home/work/ER-NeRF/data_utils/deepspeech_features/deepspeech_features.py", line 53, in conv_audios_to_deepspeech ds_features = pure_conv_audio_to_deepspeech( File "/home/work/ER-NeRF/data_utils/deepspeech_features/deepspeech_features.py", line 149, in pure_conv_audio_to_deepspeech input_vector = conv_audio_to_deepspeech_input_vector( File "/home/work/ER-NeRF/data_utils/deepspeech_features/deepspeech_features.py", line 220, in conv_audio_to_deepspeech_input_vector features = np.concatenate((empty_context, features, empty_context)) File "<__array_function__ internals>", line 180, in concatenate ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 26 and the array at index 1 has size 23

这个报错是因为视频中的音频存在问题吗?我的素材来自新闻联播中,我尝试过将视频中的音频通过ffmpeg转为16000采样率,但是没有作用。

liwu96 commented 7 months ago

我看了下代码,process.py指定--asr参数为wav2vec会调用python nerf/asr.py这个,这个路径已经没了,是否应该修改为nerf_triplane/asr.py呢?extract_audio_features这里我不加--asr参数会遇到上面的报错,希望得到您的回复 def extract_audio_features(path, mode='wav2vec'):

print(f'[INFO] ===== extract audio labels for {path} =====')
if mode == 'wav2vec':
    cmd = f'python nerf/asr.py --wav {path} --save_feats'
else: # deepspeech
    cmd = f'python data_utils/deepspeech_features/extract_ds_features.py --input {path}'
os.system(cmd)
print(f'[INFO] ===== extracted audio labels =====')
lzhchina commented 7 months ago

def extract_audio_features(path, mode='wav2vec'):

print(f'[INFO] ===== extract audio labels for {path} =====')
if mode == 'wav2vec':
    # cmd = f'python nerf/asr.py --wav {path} --save_feats'
    cmd = f'python data_utils/wav2vec.py --wav {path} --save_feats'
else: # deepspeech
    cmd = f'python data_utils/deepspeech_features/extract_ds_features.py --input {path}'
os.system(cmd)
print(f'[INFO] ===== extracted audio labels =====')