Speech analysis features wishlist

MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings

http://essentia.upf.edu

GNU Affero General Public License v3.0

2.87k stars 536 forks source link

Speech analysis features wishlist #1101

Open xaviliz opened 3 years ago

xaviliz commented 3 years ago

Hi,

I detected some features and algorithms used in Speech Processing unavailable in Essentia which might be interesting to implement. Here my proposals:

Mel Spectrogram
Delta MFCC
Delta-delta MFCC
Vocal Tract Filtering
Phase Distortion
Phase Distortion standard Deviation
Harmonic Model Phase Distortion
Pulse Model
Pre-Emphasis Filter
Direction Of Arrival: MUSIC, CSSM, TOPS, SRP-PHAT

Thanks in advance

xaviliz commented 3 years ago

Also it would be nice to include:

xaviliz commented 3 years ago

I would also suggest to add an option/parameter in StartStopSilence algorithm to provide a list of start and stop frames. Right now this algorithm provides just a start frame and stop frame, which is fine for a song. however in speech you have differnt events and intermediate frames with silence. It would be nice if we can provide a list for startFrame and stopFrame to segment events easily.