MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
2.53k stars 243 forks source link

Resolve .txt output file edge cases #118

Closed zacharygraber closed 8 months ago

zacharygraber commented 9 months ago

Problem

The code in get_speaker_aware_transcript(...) that writes the .txt output yields strange behavior in the edge case where there is only 1 identified speaker in the input audio file. Obviously it's silly to do diarization on an audio file with only 1 speaker, but in production, I can't trust my users not to do so.

The existing logic builds up a text string per-speaker, then writes it when the speaker changes. If the speaker never changes (i.e., every dict in the list has the same speaker), nothing is ever written to the file, resulting in an empty output.

Steps to reproduce

I was producing this behavior with this audio file, but it should be the result of any audio file with a single identified speaker.

Solution

Instead of accumulating a paragraph then writing it to the file when the speaker changes, simply write each sentence in the loop step, and prepend a new paragraph start when the speaker changes.

zacharygraber commented 8 months ago

Thanks for noticing this, your new code removes the aggregation of texts that were attributed to the same speaker old output: SPK 1: Sentence1 Sentence2 Sentence3 new output:

SPK 1: Sentence1
SPK 1: Sentence2
SPK 1: Sentence3

I'm not sure how you produced this new output sample, but it doesn't align with the output I get. You'll notice in my implementation the special case

if speaker != previous_speaker:
            f.write(f"\n\n{speaker}: ")
            ...

Newlines and the speaker name are only prepended when the speaker has changed. Otherwise, only the sentence text is written (and thus appended to the current paragraph).

Here's a small sample of an output I generated with this code:

Speaker 0: Well, because you told me your full name and the year you graduated and what you got your degree in. 

Speaker 1: My name is Martha Lois Wilson-Willis, and I graduated in '46, although I entered in '41, but I stayed out of here during the war, came back, and I got a master's in '48, and my major was history. 

Speaker 0: That's what I would like to study too. 

Speaker 1: My master's was in American colonial history, and I went to work in the Indiana Historical Society in Indianapolis as a research librarian, and I did that for a couple, three years, and meantime got married, and my husband was in corporate employment, so he got transferred to Illinois. But my hometown was Indianapolis. 

Speaker 0: And why did you come to IU? 

Speaker 1: Well, my parents were both very active IU alums in the '20s and '30s. My father was class of '20, my mother was class of '22, and was Phi Beta Kappa, and they both worked hard in the alumni clubs in Indianapolis. My mother, I think, was one of the organizers of the women's club when that was first organized, and they were active. My father had been very active in the memorial fundraising that led to the building of this unit building, so we came down a lot. I think they had season tickets for football games, and we came to a lot of basketball games. I remember sitting in the stands when I was a preschooler when Branch McCracken was a player. 

Speaker 0: Oh, my goodness. Where did you live while you attended IU? 

I encourage you to pull the changes yourself and try it out if you don't believe me!