unable to do voice cloning with arabic

uni-saurabh-vyas commented 1 month ago

Error:

*** TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not NoneType

code:

lang="ara"
tts = ToucanTTSInterface(device="cuda" if torch.cuda.is_available() else "cpu", tts_model_path="Meta", language=lang)

input_text = "<something in arabic script>"

# Loop through the speaker reference audio files in the folder
speaker_reference_folder = "/mnt/efs/saurabh/saurabh/tools/xtts/src_speakers_hindi3"
#dst_dir="/mnt/efs/jeena/to_saurabh/Hindi_phrases_sents/data/ims/test"

for file_name in os.listdir(speaker_reference_folder):
    if file_name.endswith('.wav'):
        speaker_reference = os.path.join(speaker_reference_folder, file_name)

        print(speaker_reference)

        # Set the speaker embedding to clone the voice
        tts.set_utterance_embedding(speaker_reference)

        # Synthesize speech with the cloned voice
        output_file_name = f"{dst_dir}/cloned_voice.wav"

        tts.read_to_file(text_list=[input_text], file_location=output_file_name)

del tts

It works for english and hindi, but for arabic it throws above error. Any thoughts? I verified nothing wrong with audio file or input text

Flux9665 commented 1 month ago

"ara" is not a valid language ID, it doesn't exist. The ISO code for standard Arabic is "arb"

uni-saurabh-vyas commented 1 month ago

Thanks, my bad. Now it's generating the audio, but getting the following warning (I hope its not a major problem) 'str' object has no attribute 'removeprefix'

Flux9665 commented 1 month ago

I have never encountered the warning, but as long as it's generating audio, it should be fine. Arabic is among the languages where we had no training data, there might be improvements in the upcoming version (released hopefully this week)

DigitalPhonetics / IMS-Toucan

unable to do voice cloning with arabic #179