ArdenButterfield / stammer

Recreate any audio track by rearranging the frames of another video
MIT License
434 stars 29 forks source link

Adds IEC A-weighting and replaces FFT with RFFT #84

Open moonjail opened 5 months ago

moonjail commented 5 months ago

Some formants (especially sibilants) were being lost by the basic audio matcher, and I suspected this was because the progressive frequency binning was giving too much weight to low-frequency (even infrasonic) components. To that end I added a "weighted" matcher subclass, which keeps all of the original FFT bins and then applies IEC A-weighting to the cosine similarity. I've attached an example of the output for comparison.

https://github.com/ArdenButterfield/stammer/assets/16582285/36759b1c-a32b-4f4d-af01-c2cce975f5fc

Also, I noticed that the two-sided numpy FFT was being computed and then sliced down to the positive frequency components; I replaced this with numpy.fft.rfft, which is identical for real signals and a little bit faster with less array slicing.

Best, Moonjail

ArdenButterfield commented 5 months ago

This is great! Thank you very much.

RXBBB commented 1 month ago

Seems like when i try on music theres quite alot of pops, but that might just be how IEC A-weighting or RFFT is, also why isnt this merged? its a good pull request