Open kchall opened 10 years ago
It should be relatively straight forward, since it sounds like they just ran the signal through a gammatone filterbank, made nonoverlapping windows of 16ms, and then calculated the Euclidean distance between successive windows.
I have some python code for gammatone filterbanks translated from some matlab code, and it looks like in scipy 0.14 it's working as intended (though the outputs weren't lining up when using scipy 0.13, so I think they fixed some bugs in their digital filtering), so it should be actually super simple to implement.
Looking at the paper they actually reference, it looks like they're doing something simpler, but something I don't have implemented. It looks like it's all frequency domain rather than time domain like the gammatone filterbank. It's more similar to the MFCC representation, but uses a rounded exponential filter rather than a triangular filter, and uses ERB and ERB-rate scaling. Anyway, it shouldn't be too hard to implement, but it'll take some further research at some point.
Cool, thanks, Michael! We could also contact Christian Stilp to see if he already has code and/or would be interested in helping us implement it.
Possibly include calculation of spectral entropy measure? see:
Stilp, C.E., & Kluender, K.R. (2010). Cochlea-scaled spectral entropy, not consonants, vowels, or time, best predicts speech intelligibility. Proceedings of the National Academy of Science, 107(27), 12387-12392.
Stilp, C.E. (2014). Information-bearing acoustic change outperforms duration in predicting sentence intelligibility of full-spectrum and noise-vocoded sentences. Journal of the Acoustical Society of America, 135(3), 1518-1529.