Closed faizulhaque closed 1 year ago
I tried to use other models. results are same.
tts --out_path hello.wav --speaker_wav hello.wav --text "This is a demo." --model_name "tts_models/en/ljspeech/speedy-speech" ββ―
> Downloading model to /Users/faizulhaque/Library/Application Support/tts/tts_models--en--ljspeech--speedy-speech
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 53.4M/53.4M [00:03<00:00, 15.4MiB/s]
> Model's license - apache 2.0
> Check https://choosealicense.com/licenses/apache-2.0/ for more info.
> vocoder_models/en/ljspeech/hifigan_v2 is already downloaded.
> Using model: speedy_speech
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:False
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000.0
| > pitch_fmin:1.0
| > pitch_fmax:640.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:2.718281828459045
| > hop_length:256
| > win_length:1024
> Vocoder Model: hifigan
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.0
| > griffin_lim_iters:60
| > signal_norm:False
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000.0
| > pitch_fmin:1.0
| > pitch_fmax:640.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:False
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:2.718281828459045
| > hop_length:256
| > win_length:1024
> Generator Model: hifigan_generator
> Discriminator Model: hifigan_discriminator
Removing weight norm...
> Text: This is a demo.
> Text splitted to sentences.
['This is a demo.']
Traceback (most recent call last):
File "/opt/homebrew/bin/tts", line 8, in <module>
sys.exit(main())
File "/opt/homebrew/lib/python3.10/site-packages/TTS/bin/synthesize.py", line 396, in main
wav = synthesizer.tts(
File "/opt/homebrew/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 316, in tts
speaker_embedding = self.tts_model.speaker_manager.compute_embedding_from_clip(speaker_wav)
AttributeError: 'NoneType' object has no attribute 'compute_embedding_from_clip'
tts --out_path hello.wav --speaker_wav hello.wav --text "This is a demo." --model_name "tts_models/en/ek1/tacotron2" ββ―
> tts_models/en/ek1/tacotron2 is already downloaded.
> vocoder_models/en/ek1/wavegrad is already downloaded.
> Using model: Tacotron2
> Setting up Audio Processor...
| > sample_rate:22050
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:-10
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:0
| > fft_size:1024
| > power:1.8
| > preemphasis:0.99
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000.0
| > pitch_fmin:1.0
| > pitch_fmax:640.0
| > spec_gain:1.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > do_rms_norm:False
| > db_level:None
| > stats_path:None
| > base:10
| > hop_length:256
| > win_length:1024
> Model's reduction rate `r` is set to: 2
> Vocoder Model: wavegrad
> Text: This is a demo.
> Text splitted to sentences.
['This is a demo.']
Traceback (most recent call last):
File "/opt/homebrew/bin/tts", line 8, in <module>
sys.exit(main())
File "/opt/homebrew/lib/python3.10/site-packages/TTS/bin/synthesize.py", line 396, in main
wav = synthesizer.tts(
File "/opt/homebrew/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 316, in tts
speaker_embedding = self.tts_model.speaker_manager.compute_embedding_from_clip(speaker_wav)
AttributeError: 'NoneType' object has no attribute 'compute_embedding_from_clip'
Those are not multispeaker models. Check recipe folder for multilanguage models.
@p0p4k I was just trying to clone single speaker using --speaker_wav and --text
yes, those models do not have speaker embeddings (that is they have no information about multispeaker transfer).
This model is a single-speaker model. It doesn't work with reference audio.
Describe the bug
Unable to find compute_embedding_from_clip
To Reproduce
tts --text "This is a demo text." --speaker_wav "my_voice.wav"
Expected behavior
No response
Logs
Environment
Additional context
No response