NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.13k stars 2.52k forks source link

NeMo/tutorials/speaker_tasks/ASR_with_SpeakerDiarization needs confidence estimation #10283

Closed jugal-sheth closed 1 month ago

jugal-sheth commented 2 months ago

While performing offline ASR_with_SpeakerDiarization the function at nemo>collections>asr>parts>utils>diarization_utils.py ` def convert_word_dict_seq_to_ctm( word_dict_seq_list: List[Dict[str, float]], uniq_id: str = 'null', decimals: int = 3 ) -> Tuple[List[str], str]:

Convert word_dict_seq_list into a list containing transcription in CTM format.

Args:
    word_dict_seq_list (list):
        List containing words and corresponding word timestamps in dictionary format.

        Example:
        >>> word_dict_seq_list = \
        >>> [{'word': 'right', 'start_time': 0.0, 'end_time': 0.34, 'speaker': 'speaker_0'},  
             {'word': 'and', 'start_time': 0.64, 'end_time': 0.81, 'speaker': 'speaker_1'},
               ...],

Returns:
    ctm_lines_list (list):
        List containing the hypothesis transcript in CTM format.

        Example:
        >>> ctm_lines_list= ["my_audio_01 speaker_0 0.0 0.34 right 0",
                              my_audio_01 speaker_0 0.64 0.81 and 0",

ctm_lines = []
confidence = 0
for word_dict in word_dict_seq_list:
    spk = word_dict['speaker']
    stt = word_dict['start_time']
    dur = round(word_dict['end_time'] - word_dict['start_time'], decimals)
    word = word_dict['word']
    ctm_line_str = f"{uniq_id} {spk} {stt} {dur} {word} {confidence}"
    ctm_lines.append(ctm_line_str)
return ctm_lines

` considers confidence as zero is it possible to use confidence estimation to return real confidence for each word

Thank you kind Regards

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

tango4j commented 1 month ago

Hi. Neither NeMo ASR models nor NeMo diarization natively support confidence level on words. You need to implement your own way to estimate the confidence level if you need such values.