ThioJoe / Auto-Synced-Translated-Dubs

Automatically translates the text of a video based on a subtitle file, and then uses AI voice services to create a new dubbed & translated audio track where the speech is synced using the subtitle's timings.
GNU General Public License v3.0
1.56k stars 156 forks source link

Voices not stretching when OpenAI models are used. #90

Open Hiuzuki opened 6 months ago

Hiuzuki commented 6 months ago

I'm using Azure TTS, trying to dub some specific tutorials from English to my language, PT-BR, and I'm having a specific problem, when everything is finished, the audio is overlapped in the transitions as there is no stretching or retracting process done, but this only happens when I use OpenAI voices (I want to use them for an obvious reason, they are much better), the conventional Azure voices work perfectly without any overlap, but they sound strange. I've tried a bunch of settings in the .ini files, but they don't seem to affect these OpenAI voice models. Any suggestion?

ThioJoe commented 6 months ago

Hm I wonder if maybe those voices don't support the mstts:audioduration SSML tag. In the mean time, try going into config.ini and setting the option force_stretch_with_twopass = True, I believe that should work. If it doesn't, try changing the two_pass_voice_synth option to the opposite of whatever it is now then try it again. Probably better to try this with a small file so you don't pay for a whole video's worth of API processing.

Also you're using ffmpeg for the stretching right? I added that in the past couple months and it's better than rubberband I've found.

Edit: After looking, it might not actually work even with the force stretch option set to true because of some checks in audio_builder.py, so I might have to change that.

Hiuzuki commented 6 months ago

Ok, i'm going to waiting anxiously.

ThioJoe commented 6 months ago

Ok try replacing your audio_builder.py file with the latest one: https://github.com/ThioJoe/Auto-Synced-Translated-Dubs/blob/main/Scripts/audio_builder.py

And also add this new option anywhere in your config.ini file and make sure it's set to true:

    # This will make it so the audio clips get stretched locally even if the TTS service allows specifying exact duration
    # This could be used when a TTS service like Azure is creating clips of incorrect length, or if certain voices don't support exact length
    # Possible Values: True  |  False (Default)
force_always_stretch = True

You shouldn't have to have force_stretch_with_twopass enabled, but if the above doesn't work maybe enable it and try again.

Hiuzuki commented 6 months ago

Hello, now it's working well, there are still some small overlaps, practically unnoticeable, absurdly better than before. Thank you very much.

Captura de tela 2024-03-26 205528

ThioJoe commented 6 months ago

Ok great. For the remaining overlaps, you could try messing around with the add_line_buffer_milliseconds setting in config.ini which will add a bit of extra space between the clips.