Closed iyaja closed 1 year ago
This PR integrates adds methods to obtain Whisper input features, embeddings, and transcripts from an AudioSignal.
AudioSignal
WhisperMixin
setup_whisper()
transformers
get_whisper_features()
get_whisper_embeddings()
get_whisper_transcript()
This new functionality allows developers to leverage the Whisper model for a wide range of audio processing tasks within the audiotools library.
audiotools
@pseeth could you also review this PR? There's things for which I lack context, for instance if we should implement this as a Mixin class?
Thank you @pseeth !
This PR integrates adds methods to obtain Whisper input features, embeddings, and transcripts from an
AudioSignal
.WhisperMixin
class with methods to set up the Whisper model, obtain input features, generate transcripts, and extract embeddings.setup_whisper()
method initializes the Whisper model and processor using thetransformers
library.get_whisper_features()
method resamples the input signal to the required sampling rate, processes the raw speech, and returns the input features.get_whisper_embeddings()
method extracts the embeddings from the input features using the Whisper encoder.get_whisper_transcript()
method generates the transcript from the input features and decodes it into text.This new functionality allows developers to leverage the Whisper model for a wide range of audio processing tasks within the
audiotools
library.