RedHenLab / multi-modal-emotion-prediction

18 stars 5 forks source link

utterance segmentation #1

Open amirim opened 6 years ago

amirim commented 6 years ago

Many thanks for the contribution, although the utterance segmentation is not a part of your work (the IEMOCAP emotion dataset is already segmented into utterances), do you have any idea about any tool that might be a good solution for this purpose?

ksingla025 commented 5 years ago

I suspect that you are trying to pass and new transcript that's why you need utterance segmentation. We you are using an ASR to transcribe then it can automatically give long pauses / speaker change as utterance boundaries.

sambhavnoobcoder commented 6 months ago

There are several tools and libraries that are commonly used in Automatic Speech Recognition (ASR) that can also aid in utterance segmentation:

Kaldi: Kaldi is a popular open-source toolkit for speech recognition that provides various tools and utilities for ASR-related tasks, including segmentation. It offers scripts and modules for speech data processing, feature extraction, and speech modeling, which can be used for utterance segmentation.

HTK (Hidden Markov Model Toolkit): HTK is another toolkit commonly used for building ASR systems. It offers functionalities for acoustic modeling, which can be adapted or used for segmentation tasks based on HMMs.

Praat: While Praat is primarily used in phonetics, it also offers capabilities for annotating and segmenting speech. It allows manual segmentation of speech signals and can be used to mark boundaries between different utterances.

LibROSA and librosa.segment functions: LibROSA is a Python library for audio and music analysis. While it's not solely an ASR tool, it provides functionalities for audio processing, feature extraction, and segmentation. The librosa.segment module offers functions that can be used for segmentation based on different criteria.

Google Speech-to-Text API: Google's Speech-to-Text API offers automatic transcription capabilities and may include segmentation functionalities to separate different utterances based on pauses or speaker turns.

CMU Sphinx: CMU Sphinx is an open-source speech recognition system that provides tools and libraries for speech recognition tasks. It might offer utilities for segmentation purposes within its suite of tools.