Forced-Alignment-and-Vowel-Extraction / fave-asr

Interface for automated transcription and time alignment of conversational interview data
https://forced-alignment-and-vowel-extraction.github.io/fave-asr/
GNU General Public License v3.0
3 stars 0 forks source link

[Bug]: No guarantee that assigned word-level speaker is same as assigned utterance-level speaker #7

Open chrisbrickhouse opened 7 months ago

chrisbrickhouse commented 7 months ago

Contact Details

No response

What happened?

For some reason, the speaker assigned to a given word is not necessarily the same as the speaker assigned to the utterance that word is part of. You can see this in the test data for the utterance "I only need a few months." The utterance-level speaker is "SPEAKER_00" but the word-level speaker for "I" is "SPEAKER_01" (see logs)

The consequences of this are unclear, but documenting the bug in case it becomes a future issue. In some future release it would be nice to have some kind of sanity check on the output to ensure that the subdivision speaker assignments are consistent with their parent assignment.

What operating system are you using?

Ubuntu

Relevant log output

{"start": 8.923, "end": 9.96, "text": " I only need a few months.", "words": [{"word": "I", "start": 8.923, "end": 9.147, "score": 0.865, "speaker": "SPEAKER_01"}, {"word": "only", "start": 9.167, "end": 9.33, "score": 0.513, "speaker": "SPEAKER_00"}, {"word": "need", "start": 9.35, "end": 9.472, "score": 0.895, "speaker": "SPEAKER_00"}, {"word": "a", "start": 9.492, "end": 9.512, "score": 0.989, "speaker": "SPEAKER_00"}, {"word": "few", "start": 9.553, "end": 9.695, "score": 0.929, "speaker": "SPEAKER_00"}, {"word": "months.", "start": 9.736, "end": 9.96, "score": 0.575, "speaker": "SPEAKER_00"}], "speaker": "SPEAKER_00"},

Code of Conduct