Some formants (especially sibilants) were being lost by the basic audio matcher, and I suspected this was because the progressive frequency binning was giving too much weight to low-frequency (even infrasonic) components. To that end I added a "weighted" matcher subclass, which keeps all of the original FFT bins and then applies IEC A-weighting to the cosine similarity. I've attached an example of the output for comparison.
Also, I noticed that the two-sided numpy FFT was being computed and then sliced down to the positive frequency components; I replaced this with numpy.fft.rfft, which is identical for real signals and a little bit faster with less array slicing.
Seems like when i try on music theres quite alot of pops, but that might just be how IEC A-weighting or RFFT is, also why isnt this merged? its a good pull request
Some formants (especially sibilants) were being lost by the basic audio matcher, and I suspected this was because the progressive frequency binning was giving too much weight to low-frequency (even infrasonic) components. To that end I added a "weighted" matcher subclass, which keeps all of the original FFT bins and then applies IEC A-weighting to the cosine similarity. I've attached an example of the output for comparison.
https://github.com/ArdenButterfield/stammer/assets/16582285/36759b1c-a32b-4f4d-af01-c2cce975f5fc
Also, I noticed that the two-sided numpy FFT was being computed and then sliced down to the positive frequency components; I replaced this with numpy.fft.rfft, which is identical for real signals and a little bit faster with less array slicing.
Best, Moonjail