MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.87k stars 536 forks source link

Speech analysis features wishlist #1101

Open xaviliz opened 3 years ago

xaviliz commented 3 years ago

Hi,

I detected some features and algorithms used in Speech Processing unavailable in Essentia which might be interesting to implement. Here my proposals:

  1. Mel Spectrogram
  2. Delta MFCC
  3. Delta-delta MFCC
  4. Vocal Tract Filtering
  5. Phase Distortion
  6. Phase Distortion standard Deviation
  7. Harmonic Model Phase Distortion
  8. Pulse Model
  9. Pre-Emphasis Filter
  10. Direction Of Arrival: MUSIC, CSSM, TOPS, SRP-PHAT

Thanks in advance

xaviliz commented 3 years ago

Also it would be nice to include:

  1. PLP
  2. I-Vectors
  3. X-Vectors
xaviliz commented 3 years ago

I would also suggest to add an option/parameter in StartStopSilence algorithm to provide a list of start and stop frames. Right now this algorithm provides just a start frame and stop frame, which is fine for a song. however in speech you have differnt events and intermediate frames with silence. It would be nice if we can provide a list for startFrame and stopFrame to segment events easily.