Bruno-val-bus / student-helper

0 stars 0 forks source link

The ReadingEvaluator should use an Audio2TextConverter class. What is the behaviour of this class? -> proposal simply take in an audio file and return string-timestamp-pair. We need 2 new .py files in the services module: converters.py, converters_factory.py @ernOho #17

Open ernOho opened 3 months ago

ernOho commented 3 months ago
Bruno-val-bus commented 2 months ago

Here are our known classes: image The corresponding evaluator class would be responsible for passing the Recordingobject to the new Audio2TextConverter implementation (green) and call its convert method. Then fetch again the Recording object, fetch self.recording.texts_timestamps and use it in it's evaluate method we have already implemented. For reference here is the Recording class (s. ./app/models/pydantic/sessions.py): image

I am assuming 1 Recording object corresponds to one speaker (student). If we want to diarize beforehand to recognize different speakers, we should do it before passing the Recordingobject to the new Audio2TextConverter at a higher level and assign the diarized file to recording.audio_segments_paths

Here are the new proposed class Audio2TextConverter and its dependencies/implementations: image

Here are some proposed functionalities (more or less pseudo-code).

For Audio2TextConverter:

def _recording_is_diarized():
  if self.recording.audio_segments_paths is not None:
    return True
  return False

For SingleFileDiarization:

def diarize(origin_audio_file_path: str) -> list[str]:
  # produce "diarized" files and return their paths
  diarized_audio_paths: list[str] = [origin_audio_file_path]
  return diarized_audio_paths

For MultiFileDiarization:

def diarize(origin_audio_file_path: str) -> list[str]:
  # produce "diarized" files and return their paths
  diarized_audio_paths: list[str] = # use diarization tool 
  return diarized_audio_paths

Audio2DiarizedSegments or Audio2TimestampedSegments are the ones we would use for the ReadingEvaluator (s. first image). Note the Transcription and Diarization classes each one uses to produce the desired outpu which will be saved in self.recording.texts_timestamps for later use int the evaluator classes.

For Audio2DiarizedSegments the implementation would look somtehing like:

def convert() -> None:
  if  self._recording_is_diarized():
    # produce "diarized" files, return their paths and save them to recording object
    diarized_audios_paths: list[str] = self.diarization_module.diarize(self.recording.audio_file_path)
    self.recording.audio_segments_paths = diarized_audios_paths 

  for path in self.recording.audio_segments_paths:
    # transcribe the diarized paths
    transcribed_audio_timestamps: dict[str: float] = self.transcription_module.transcribe(path)
    transcribed_audio = transcribed_audio_timestamps.keys()[0]
    # maybe extract timestamp from path?
    time_stamp = path 
    # save them to recording object
    self.recording.texts_timestamps.update{transcribed_audio: time_stamp}

Just for reference, Audio2Sentences would need something akin to:

def convert() -> None:
  if  self._recording_is_diarized():
    # produce "diarized" files, return their paths and save them to recording object
    diarized_audios_paths: list[str] = self.diarization_module.diarize(self.recording.audio_file_path)
    self.recording.audio_segments_paths = diarized_audios_paths 

  # transcribe the diarized paths
  for path in self.recording.audio_segments_paths:
    transcribed_audio, time_stamp = self.transcription_module.transcribe(path)
    # some processing to convert the 'transcribed_audio' to sentence and assign corresponding timestamp
    sentence: str = transcribed_audio
    sentence_timestamp: float= time_stamp

   # save them to recording object
    self.recording.texts_timestamps.update{sentence: sentence_timestamp}
Bruno-val-bus commented 2 months ago

The object that I will fetch at the Evaluator level is the texts_timestamps attribute in the Recording object (s. class diagram). That is the purpose of saving the transcription results in self.recording.texts_timestamps.

I am assuming 1 Recordingobject corresponds to one speaker (student).