MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
2.44k stars 238 forks source link

Transcription for non-verbal/non-speech labels(laughter etc.)? #179

Open MelissaChen15 opened 2 months ago

MelissaChen15 commented 2 months ago

Hi,

Is the possible to include non-verbal/non-speech labels, like laughter, in the transcript?

Thank you!

MahmoudAshraf97 commented 2 months ago

Hi, is that possible in the original whisper implementation? if not then it'll not be possible here unfortunately

MelissaChen15 commented 2 months ago

Hi, thank you so much for replying to me!

Yes, the original whisper outputs laughter as "Hahaha," but it seems that laughter is not being transcribed here. (Currently, I am only concerned about laughter rather than other backchannels.)

Do you have any ideas what might be the cause?

MahmoudAshraf97 commented 2 months ago

This is going to be tough to debug, but initially make sure to disable demucs using --no-stem and that whisper transcription arguments are identical to original whisper

On Wed, May 1, 2024, 3:31 AM Meiying Melissa Chen @.***> wrote:

Hi, thank you so much for replying to me!

Yes, the original whisper outputs laughter as "Hahaha," but it seems that laughter is not being transcribed here. (Currently, I am only concerned about laughter rather than other backchannels.)

Do you have any ideas what might be the cause?

— Reply to this email directly, view it on GitHub https://github.com/MahmoudAshraf97/whisper-diarization/issues/179#issuecomment-2087768818, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXHGLD6MN2LLWDZD33EKOLZAAZXBAVCNFSM6AAAAABGUBFZ2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBXG43DQOBRHA . You are receiving this because you commented.Message ID: @.***>