benfmiller / audalign

Package for aligning audio files through audio fingerprinting
MIT License
84 stars 2 forks source link

[Request/Suggestion] Support unpredictable frame drops and unmatching speed/pitch (drift correction) #54

Open alopatindev opened 6 months ago

alopatindev commented 6 months ago

I'm looking for a possibility to perform (potentially destructive) audio tracks synchronization from old (dubbed in different language) and remastered versions of movies.

In my scenario, applying single audio shift is not enough: sooner or later audios become out of sync at least due to

Any interest in supporting such a scenario?

Any existing projects that try to accomplish this problem?

Any ideas what's the best way to implement it?


Naive idea for implementation:

Thanks!

benfmiller commented 4 months ago

Sorry for the late reply, and thanks for the suggestion!

Audalign currently has a "locality" feature, which breaks up audio files into segments and aligns based on the strength of the match between segments of the audio file (more info in wiki). This could be relatively easily used to stretch the audio files, but wouldn't handle frame drops.

It looks like AudioAlign's graph/feature is purely based on correlation? I don't have much time to work on this in the near future, but if it's an easy change I'd be happy to work on it. Or, I'd gladly accept pull requests!

silero-vad and whisper look like a neat idea for a new recognizer! For this case, would translated audio segments necessarily line up with word starts and ends? Would translated segments be viable as time markers, or would shrink/stretching have to be done based on the background?