Open mvoodarla opened 5 months ago
hey @mvoodarla sounds great to know that you are excited and working towards this area.
i am building my own custom API as well to do video to video translations. i have figured everything out from transcription of audio phrases, with timestamps ... to translation of those phrases and TTS of translated texts.
the only part where i am really stuck is how do i synchronize the dubbed voice exactly in the original video as different languages would have different contextual and pronouncing disparities of time.
additionally, incase the video contains other non-language speech related elements. like laughing, coughing, or some background sounds, music etc how do i incorporate all of this from the original video to the translated (dubbed) video ?
any help would be very highly appreciated. do you know of any solutions for the same ?
Thank you for building this project! I work at a company called Sieve and this is a part of what inspired us to build our Dubbing API. It's a bit different than this as it supports voice cloning, different voice engines, and higher quality translations using other closed-source solutions but it's an example of the bounds of what this tech can do today.
I'd love to contribute our learnings in some way to this project. I think the most challenge part of the problem is around how one handles audio speedups and slowdowns across languages. Different applications seem to want different tradeoffs in the "sync"-ness versus how drastic the speedup tends to be.
Curious if there are improvements in the queue on that vector for this project and if we can contribute in any way? Would also love feedback on what we've built as I think it's something the community would love!