MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
2.53k stars 243 forks source link

Weird words repetitions on zh #88

Closed terryops closed 3 months ago

terryops commented 10 months ago

I've tested this project with English(default model) and it worked as expected, but when I run the same audio with Large model, I encountered RuntimeError: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size error. But if I switched to another audio in Chinese(using large-V2), it goes without error, but with so much weird words repetitions in the output. Translating the Chinese below, it's akin to:

Speaker 1: Please don't don't hesitate to like ke and subscribe scribe

Speaker 1: 请不 不吝点 点赞 赞 订阅 阅

Speaker 0: 转发 发 打赏 赏支 支持 持明 明镜 镜与 与点 点点 点栏 栏目 目 不去 去锵锵 锵三人 人行 行了 了 其实 实我 我特 特想 想来 来 几次 次编 编导 导约 约我 我都 都是 是时 时间 间冲 冲突 突嘛 嘛 冲突 突呢 呢 那我 我就 就是 是很 很世 世俗 俗的 的认 认为 为利 利益 益最 最大

Update: Tested on Japanese and get the same result as well. Tested on French and works well just like English.

Toby1091 commented 10 months ago

I've been getting the same error with different languages (including English actually). Any help is much appreciated

XinyuZhou2000 commented 10 months ago

same error :(

AlbinGyllander commented 10 months ago

I got the same error when using this https://github.com/ggerganov/whisper.cpp Do you get the same results using the official Whisper from OpenAI? I decided to revert back to that because got too inaccurate results using anything other than that.

PiotrEsse commented 7 months ago

When Ive played with whisper large model, different languages works as expected. When I use WhisperX or other implementation os speed ones - I have issues with languages different than english.