Hangz-nju-cuhk / Talking-Face_PC-AVS

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)
Creative Commons Attribution 4.0 International
916 stars 169 forks source link

TypeError: mel() takes 0 positional arguments but 2 positional arguments (and 3 keyword-only arguments) were given #65

Closed TengTengCai closed 1 year ago

TengTengCai commented 1 year ago

System:Windows10 Python: 3.8.10 requirements.txt:

absl-py==1.4.0 audioread==3.0.0 certifi==2022.12.7 cffi==1.15.1 charset-normalizer==3.0.1 colorama==0.4.6 contourpy==1.0.7 cycler==0.11.0 decorator==5.1.1 dill==0.3.6 dominate==2.7.0 fonttools==4.38.0 grpcio==1.51.3 idna==3.4 imageio==2.26.0 importlib-metadata==6.0.0 importlib-resources==5.12.0 joblib==1.2.0 kiwisolver==1.4.4 lazy_loader==0.1 librosa==0.10.0 llvmlite==0.39.1 lws==1.2.7 Markdown==3.4.1 MarkupSafe==2.1.2 matplotlib==3.7.0 msgpack==1.0.4 networkx==3.0 numba==0.56.4 numpy==1.23.5 opencv-python==4.7.0.72 packaging==23.0 Pillow==9.4.0 platformdirs==3.0.0 pooch==1.7.0 protobuf==4.22.0 pycparser==2.21 pyparsing==3.0.9 python-dateutil==2.8.2 PyWavelets==1.4.1 requests==2.28.2 scikit-image==0.20.0 scikit-learn==1.2.1 scipy==1.9.1 six==1.16.0 soundfile==0.12.1 soxr==0.3.3 tensorboard==1.14.0 threadpoolctl==3.1.0 tifffile==2023.2.28 torch==1.13.1+cu117 torchaudio==0.13.1+cu117 torchvision==0.14.1+cu117 tqdm==4.64.1 typing_extensions==4.5.0 urllib3==1.26.14 Werkzeug==2.2.3 zipp==3.15.0

When I was running: python -u inference.py --name demo --meta_path_vox ./misc/demo.csv --dataset_mode voxtest --netG modulate --netA resseaudio --netA_sync ressesync --netD multiscale --netV resnext --netE fan --model av --gpu_ids 0 --clip_len 1 --batchSize 4 --style_dim 2560 --nThreads 1 --input_id_feature --generate_interval 1 --style_feature_loss --use_audio 1 --noise_pose --driving_pose --gen_video --generate_from_audio_only

----------------- Options --------------- D_input: single VGGFace_pretrain_path: aspect_ratio: 1.0 audio_nc: 256 augment_target: False batchSize: 4 [default: 2] beta1: 0.5 beta2: 0.999 checkpoints_dir: ./checkpoints clip_len: 1 crop: False crop_len: 16 crop_size: 224 data_path: /home/SENSETIME/zhouhang1/Downloads/VoxCeleb2/voxceleb2_train.csv dataset_mode: voxtest defined_driven: False dis_feat_rec: False display_winsize: 224 driven_type: face driving_pose: True [default: False] feature_encoded_dim: 2560 feature_fusion: concat filename_tmpl: {:06}.jpg fitting_iterations: 10 frame_interval: 1 frame_rate: 25 gan_mode: hinge gen_video: True [default: False] generate_from_audio_only: True [default: False] generate_interval: 1 gpu_ids: 0 has_mask: False heatmap_size: 3 hop_size: 160 how_many: inf init_type: xavier init_variance: 0.02 input_id_feature: True [default: False] input_path: ./checkpoints/results/input_path isTrain: False [default: None] label_mask: False lambda_D: 1 lambda_contrastive: 100 lambda_crossmodal: 1 lambda_feat: 10.0 lambda_image: 1.0 lambda_rotate_D: 0.1 [default: 0.1] lambda_softmax: 1000000 lambda_vgg: 10.0 lambda_vggface: 5.0 landmark_align: False landmark_type: min list_end: inf list_num: 0 list_start: 0 load_from_opt_file: False load_landmark: False lr: 0.001 lrw_data_path: /home/SENSETIME/zhouhang1/Downloads/VoxCeleb2/voxceleb2_train.csv max_dataset_size: 9223372036854775807 meta_path_vox: ./misc/demo.csv mode: cpu model: av multi_gpu: False nThreads: 1 n_mel_T: 4 name: demo ndf: 64 nef: 16 netA: resseaudio netA_sync: ressesync netD: multiscale netE: fan netG: modulate netV: resnext ngf: 64 no_TTUR: False no_flip: True no_ganFeat_loss: False no_gaussian_landmark: False no_id_loss: False no_instance: False no_pairing_check: False no_spectrogram: False no_vgg_loss: False noise_pose: True [default: False] norm_A: spectralinstance norm_D: spectralinstance norm_E: spectralinstance norm_G: spectralinstance num_bins_per_frame: 4 num_classes: 5830 num_clips: 1 num_frames_per_clip: 5 num_inputs: 1 onnx: False optimizer: adam output_nc: 3 phase: test pose_dim: 12 positional_encode: False preprocess_mode: resize_and_crop results_dir: ./results/ save_path: ./results/ serial_batches: False start_ind: 0 style_dim: 2560 [default: 2580] style_feature_loss: True [default: False] target_crop_len: 0 train_dis_pose: False train_recognition: False train_sync: False train_word: False trainer: audio use_audio: 1 use_audio_id: 0 use_transformer: False verbose: False vgg_face: False which_epoch: latest word_loss: False ----------------- End ------------------- Traceback (most recent call last): File "inference.py", line 107, in main inference_single_audio(opt, path_label, model) dataloader = data.create_dataloader(opt) File "C:\Users\Administrator\PycharmProjects\Talking-Face_PC-AVS\data__init__.py", line 41, in create_dataloader instance.initialize(opt) File "C:\Users\Administrator\PycharmProjects\Talking-Face_PC-AVS\data\voxtest_dataset.py", line 96, in initialize self.spectrogram = self.audio.audio_to_spectrogram(wav) File "C:\Users\Administrator\PycharmProjects\Talking-Face_PC-AVS\config\AudioConfig.py", line 177, in audio_to_spectrogram spectrogram = self.melspectrogram(wav).astype(np.float32).T File "C:\Users\Administrator\PycharmProjects\Talking-Face_PC-AVS\config\AudioConfig.py", line 107, in melspectrogram S = self._amp_to_db(self._linear_to_mel(np.abs(D))) - self.ref_level_db File "C:\Users\Administrator\PycharmProjects\Talking-Face_PC-AVS\config\AudioConfig.py", line 144, in _linear_to_mel File "C:\Users\Administrator\PycharmProjects\Talking-Face_PC-AVS\config\AudioConfig.py", line 149, in _build_mel_basis TypeError: mel() takes 0 positional arguments but 2 positional arguments (and 3 keyword-only arguments) were given misc/Input/517600055 1 misc/Pose_Source/517600078 160 misc/Audio_Source/681600002.mp3 misc/Mouth_Source/681600002 363 dummy mel() takes 0 positional arguments but 2 positional arguments (and 3 keyword-only arguments) were given

TengTengCai commented 1 year ago

I find reason. librosa==0.10.0 ××× librosa==0.9.2 √√√

Thank You!