PlayVoice / lora-svc

singing voice change based on whisper, and lora for singing voice clone
MIT License
630 stars 78 forks source link

执行音质增强时提示:HifiGAN model file is not found! #46

Closed fatinghenji closed 1 year ago

fatinghenji commented 1 year ago

image 如上图,按照说明放置了相关文件,但是执行时报错如下:

(lora-svc) PS G:\AI\lora-svc> python svc_val_nsf_hifigan.py
| Hparams chains:  ['nsf_hifigan/configs/basics/base.yaml', 'nsf_hifigan/configs/basics/fs2.yaml', 'nsf_hifigan/configs/acoustic/nomidi.yaml']
| Hparams: 
K_step: 1000, accumulate_grad_batches: 1, audio_num_mel_bins: 128, audio_sample_rate: 44100, base_config: ['nsf_hifigan/configs/basics/fs2.yaml'], 
binarization_args: {'shuffle': True, 'with_txt': True, 'with_wav': False, 'with_align': True, 'with_spk_embed': False, 'with_f0': True, 'with_f0cwt': True}, binarizer_cls: data_gen.acoustic.AcousticBinarizer, binary_data_dir: data/opencpop/binary, check_val_every_n_epoch: 10, clip_grad_norm: 1, 
content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2, cwt_loss: l1, 
cwt_std_scale: 0.8, datasets: ['opencpop'], debug: False, dec_ffn_kernel_size: 9, dec_layers: 4, 
decay_steps: 50000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet, diff_loss_type: l2, 
dilation_cycle_length: 4, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], dur_loss: mse, 
dur_predictor_kernel: 3, dur_predictor_layers: 2, enc_ffn_kernel_size: 9, enc_layers: 4, encoder_K: 8, 
encoder_type: fft, endless_ds: True, f0_embed_type: continuous, ffn_act: gelu, ffn_padding: SAME, 
fft_size: 2048, fmax: 16000, fmin: 40, g2p_dictionary: nsf_hifigan/na.txt, gamma: 0.5, 
gaussian_start: True, gen_dir_name: , gen_tgt_spk_id: -1, hidden_size: 256, hop_size: 512, 
infer: False, keep_bins: 128, lambda_commit: 0.25, lambda_energy: 0.0, lambda_f0: 0.0, 
lambda_ph_dur: 0.0, lambda_sent_dur: 0.0, lambda_uv: 0.0, lambda_word_dur: 0.0, load_ckpt: , 
log_interval: 100, loud_norm: False, lr: 0.0004, max_beta: 0.02, max_epochs: 1000, 
max_eval_sentences: 1, max_eval_tokens: 60000, max_frames: 8000, max_input_tokens: 1550, max_sentences: 48, 
max_tokens: 80000, max_updates: 320000, mel_loss: ssim:0.5|l1:0.5, mel_vmax: 1.5, mel_vmin: -6.0, 
min_level_db: -120, norm_type: gn, num_ckpt_keep: 3, num_heads: 2, num_sanity_val_steps: 1,
num_spk: 1, num_test_samples: 0, num_valid_plots: 10, optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98,
original_g2p_dictionary: nsf_hifigan/na.txt, out_wav_norm: False, permanent_ckpt_interval: 40000, permanent_ckpt_start: 120000, pitch_ar: False,
pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l1, pitch_norm: log, pitch_type: frame,
pndm_speedup: 10, pre_align_args: {'use_tone': True, 'forced_align': 'mfa', 'use_sox': False, 'txt_processor': 'en', 'allow_no_txt': False, 'denoise': False}, pre_align_cls: , predictor_dropout: 0.5, predictor_gpredictor_predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256,
pretrain_fs_ckpt: , processed_data_dir: , profile_infer: False, raw_data_dir: data/opencpop/raw, ref_norm_layer: bn,
rel_pos: True, reset_phone_dict: True, residual_channels: 384, residual_layers: 20, save_best: False,
save_ckpt: True, save_codes: ['configs', 'modules', 'src', 'utils'], save_f0: True, save_gt: False, schedule_type: linear,
seed: 1234, sort_by_len: True, spec_max: [0], spec_min: [-5], spk_cond_steps: [],
stop_token_weight: 5.0, task_cls: src.naive_task.NaiveTask, test_ids: [], test_input_dir: , test_num: 0,
test_prefixes: ['2044', '2086', '2092', '2093', '2100'], test_set_name: test, timesteps: 1000, train_set_name: train, use_denoise: False,
use_energy_embed: False, use_gt_dur: False, use_gt_f0: False, use_key_shift_embed: False, use_midi: False,
use_nsf: True, use_pitch_embed: True, use_pos_embed: True, use_speed_embed: False, use_spk_embed: False,
use_spk_id: False, use_split_spk_id: False, use_uv: False, use_var_enc: False, val_check_interval: 2000,
valid_num: 0, valid_set_name: valid, validate: False, vocoder: NsfHifiGAN, vocoder_ckpt: nsf_hifigan_pretrain/nsf_hifigan/model,
warmup_updates: 2000, wav2spec_eps: 1e-6, weight_decay: 0, win_size: 2048, work_dir: ,

Traceback (most recent call last):
  File "G:\AI\lora-svc\svc_val_nsf_hifigan.py", line 56, in <module>
    vocoder = NsfHifiGAN()
  File "G:\AI\lora-svc\nsf_hifigan\src\vocoders\nsf_hifigan.py", line 18, in __init__
    assert os.path.exists(model_path), 'HifiGAN model file is not found!'
AssertionError: HifiGAN model file is not found!
MaxMax2016 commented 1 year ago

nsf_hifigan_pretrain/ │ └── README.md │ └── nsf_hifigan │ ├── config.json │ ├── model │ ├── NOTICE.txt │ └── NOTICE.zh-CN.txt