Closed Frederieke93 closed 1 year ago
@Frederieke93 you need to use speaker_encoder_as _loss to do vc and use dvector files while training.
Thank you for your response. However I thought it would also be possible to do voice conversion with the normal VITS model that doesn't use the speaker encoder (and not uses a d-vector). Is that not correct?
@Edresson π
Thank you for your response. However I thought it would also be possible to do voice conversion with the normal VITS model that doesn't use the speaker encoder (and not uses a d-vector). Is that not correct?
Yeah, indeed It is a bug. PR #2187 fix this issue.
Example of the command to do voice conversion using the released VCTK VITS model:
tts --model_name "tts_models/en/vctk/vits" --reference_wav p226_001_mic1.flac --reference_speaker_idx "p226" --speaker_idx "p225"
Similar issue/bug when using Python API. When running this code: Is it by any chance related @Edresson?
tts = TTS("tts_models/en/vctk/vits")
tts.tts_with_vc_to_file(
text = text_input,
speaker_wav="target/speaker.wav",
file_path="ouptut/vits.wav"
)
Receiving this error:
[/usr/local/lib/python3.9/dist-packages/TTS/api.py](https://localhost:8080/#) in _check_arguments(self, speaker, language, speaker_wav, emotion, speed)
428 # check for the coqui tts models
429 if self.is_multi_speaker and (speaker is None and speaker_wav is None):
--> 430 raise ValueError("Model is multi-speaker but no `speaker` is provided.")
431 if self.is_multi_lingual and language is None:
432 raise ValueError("Model is multi-lingual but no `language` is provided.")
ValueError: Model is multi-speaker but no `speaker` is provided.
But after adding a parameter 'speaker':
tts = TTS("tts_models/en/vctk/vits")
tts.tts_with_vc_to_file(
text = text_input,
speaker=tts.speakers[7],
speaker_wav="target/speaker.wav",
file_path="ouptut/vits.wav"
)
The parameter is not recognized:
TypeError Traceback (most recent call last)
[<ipython-input-25-5194fa66d3f0>](https://localhost:8080/#) in <cell line: 2>()
1 tts = TTS("tts_models/en/vctk/vits")
----> 2 tts.tts_with_vc_to_file(
3 text = text_input,
4 speaker=tts.speakers[7],
5 speaker_wav="target/speaker.wav",
TypeError: tts_with_vc_to_file() got an unexpected keyword argument 'speaker'
Similar issue/bug when using Python API. When running this code: Is it by any chance related @Edresson?
tts = TTS("tts_models/en/vctk/vits") tts.tts_with_vc_to_file( text = text_input, speaker_wav="target/speaker.wav", file_path="ouptut/vits.wav" )
Receiving this error:
[/usr/local/lib/python3.9/dist-packages/TTS/api.py](https://localhost:8080/#) in _check_arguments(self, speaker, language, speaker_wav, emotion, speed) 428 # check for the coqui tts models 429 if self.is_multi_speaker and (speaker is None and speaker_wav is None): --> 430 raise ValueError("Model is multi-speaker but no `speaker` is provided.") 431 if self.is_multi_lingual and language is None: 432 raise ValueError("Model is multi-lingual but no `language` is provided.") ValueError: Model is multi-speaker but no `speaker` is provided.
But after adding a parameter 'speaker':
tts = TTS("tts_models/en/vctk/vits") tts.tts_with_vc_to_file( text = text_input, speaker=tts.speakers[7], speaker_wav="target/speaker.wav", file_path="ouptut/vits.wav" )
The parameter is not recognized:
TypeError Traceback (most recent call last) [<ipython-input-25-5194fa66d3f0>](https://localhost:8080/#) in <cell line: 2>() 1 tts = TTS("tts_models/en/vctk/vits") ----> 2 tts.tts_with_vc_to_file( 3 text = text_input, 4 speaker=tts.speakers[7], 5 speaker_wav="target/speaker.wav", TypeError: tts_with_vc_to_file() got an unexpected keyword argument 'speaker'
No it is not. It is not a bug. tts_with_vc_to_file() Is not designed to do voice conversion. It is designed to generate speech and then convert to a new voice using a voice conversion model like FreeVC. Also it do not support multi speaker models (given that the parameter speaker is not implemented). If you want to do uses directly "voice_conversion_to_file()" method. However, it will not works with VITS model because it is a traditional multi-speaker model and cant receives a wav reference and copy the speaker characteristic from it.
No it is not. It is not a bug. tts_with_vc_to_file() Is not designed to do voice conversion. It is designed to generate speech and then convert to a new voice using a voice conversion model like FreeVC. Also it do not support multi speaker models (given that the parameter speaker is not implemented). If you want to do voice conversion uses directly "voice_conversion_to_file()" method. However, it will not works with VITS model because it is a traditional multi-speaker model and cant receives a wav reference and copy the speaker characteristic from it.
Describe the bug
Hi all! I've been finetuning the VITS model on my own dataset, that has two speakers. After training, I wanted to do voice conversion from speaker 1 (speaker_idx) to speaker 2 (reference_speaker_idx) with a reference_wav (from speaker 2). I tried synthesizing as follows:
However I get the following error message: TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not int. Which can be found in the following file:
To Reproduce
I did finetuning with the following config: config_vits_v2.json
Expected behavior
The expected behavior was an output audio file, where the reference_wav was spoken by speaker_1
Logs
No response
Environment
Additional context
Thank you for helping me out!!