Open ernOho opened 3 months ago
Here are our known classes:
The corresponding evaluator class would be responsible for passing the Recording
object to the new Audio2TextConverter
implementation (green) and call its convert method. Then fetch again the
Recording
object, fetch self.recording.texts_timestamps
and use it in it's evaluate method we have already implemented. For reference here is the Recording
class (s. ./app/models/pydantic/sessions.py
):
I am assuming 1 Recording
object corresponds to one speaker (student). If we want to diarize beforehand to recognize different speakers, we should do it before passing the Recording
object to the new Audio2TextConverter
at a higher level and assign the diarized file to recording.audio_segments_paths
Here are the new proposed class Audio2TextConverter
and its dependencies/implementations:
Here are some proposed functionalities (more or less pseudo-code).
For Audio2TextConverter
:
def _recording_is_diarized():
if self.recording.audio_segments_paths is not None:
return True
return False
For SingleFileDiarization
:
def diarize(origin_audio_file_path: str) -> list[str]:
# produce "diarized" files and return their paths
diarized_audio_paths: list[str] = [origin_audio_file_path]
return diarized_audio_paths
For MultiFileDiarization
:
def diarize(origin_audio_file_path: str) -> list[str]:
# produce "diarized" files and return their paths
diarized_audio_paths: list[str] = # use diarization tool
return diarized_audio_paths
Audio2DiarizedSegments
or Audio2TimestampedSegments
are the ones we would use for the ReadingEvaluator
(s. first image). Note the Transcription and Diarization classes each one uses to produce the desired outpu which will be saved in self.recording.texts_timestamps
for later use int the evaluator classes.
For Audio2DiarizedSegments
the implementation would look somtehing like:
def convert() -> None:
if self._recording_is_diarized():
# produce "diarized" files, return their paths and save them to recording object
diarized_audios_paths: list[str] = self.diarization_module.diarize(self.recording.audio_file_path)
self.recording.audio_segments_paths = diarized_audios_paths
for path in self.recording.audio_segments_paths:
# transcribe the diarized paths
transcribed_audio_timestamps: dict[str: float] = self.transcription_module.transcribe(path)
transcribed_audio = transcribed_audio_timestamps.keys()[0]
# maybe extract timestamp from path?
time_stamp = path
# save them to recording object
self.recording.texts_timestamps.update{transcribed_audio: time_stamp}
Just for reference, Audio2Sentences
would need something akin to:
def convert() -> None:
if self._recording_is_diarized():
# produce "diarized" files, return their paths and save them to recording object
diarized_audios_paths: list[str] = self.diarization_module.diarize(self.recording.audio_file_path)
self.recording.audio_segments_paths = diarized_audios_paths
# transcribe the diarized paths
for path in self.recording.audio_segments_paths:
transcribed_audio, time_stamp = self.transcription_module.transcribe(path)
# some processing to convert the 'transcribed_audio' to sentence and assign corresponding timestamp
sentence: str = transcribed_audio
sentence_timestamp: float= time_stamp
# save them to recording object
self.recording.texts_timestamps.update{sentence: sentence_timestamp}
The object that I will fetch at the Evaluator level is the texts_timestamps
attribute in the Recording
object (s. class diagram). That is the purpose of saving the transcription results in self.recording.texts_timestamps
.
I am assuming 1 Recording
object corresponds to one speaker (student).