ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.65k stars 3.63k forks source link

Many repeated words with stream example. #731

Closed mrmachine closed 1 year ago

mrmachine commented 1 year ago

Is this a general problem with Whisper? Does it occur more often with the stream method? Maybe because it happens more often at the end of a file and the stream method is transcribing a new file every 30 seconds?

### Transcription 70 START | t0 = 1689245 ms | t1 = 1719245 ms

[00:00.000 --> 00:07.000]   Oh, we could use these to call.
[00:07.000 --> 00:08.000]   That's amazing.
[00:08.000 --> 00:09.000]   We could use these to call all the animals up.
[00:09.000 --> 00:10.000]   This up and go, "Good day, good day."
[00:10.000 --> 00:11.000]   It works fine.
[00:11.000 --> 00:12.000]   You live here.
[00:12.000 --> 00:13.000]   You live here.
[00:13.000 --> 00:14.000]   You want to take one of the small ones?
[00:14.000 --> 00:15.000]   Small.
[00:15.000 --> 00:16.000]   Small.
[00:16.000 --> 00:17.000]   Small.
[00:17.000 --> 00:18.000]   Small.
[00:18.000 --> 00:19.000]   Small.
[00:19.000 --> 00:20.000]   Small.
[00:20.000 --> 00:21.000]   Small.
[00:21.000 --> 00:22.000]   Small.
[00:22.000 --> 00:23.000]   Small.
[00:23.000 --> 00:24.000]   Small.
[00:24.000 --> 00:25.000]   Small.
[00:25.000 --> 00:26.000]   Small.
[00:26.000 --> 00:27.000]   Small.
[00:27.000 --> 00:28.000]   Small.
[00:28.000 --> 00:29.000]   Small.

### Transcription 70 END

### Transcription 71 START | t0 = 1718503 ms | t1 = 1748503 ms

[00:00.000 --> 00:07.000]   This is where I go.
[00:07.000 --> 00:14.000]   I'm going to live there.
[00:14.000 --> 00:21.000]   I'm going to live there.
[00:21.000 --> 00:28.000]   I'm going to live there.
[00:28.000 --> 00:38.000]   [BLANK_AUDIO]

### Transcription 71 END
leohuang2013 commented 1 year ago

Same issue here, using ggml-base.bin

... [00:04:08.760 --> 00:04:09.920] I don't like him very much." [00:04:09.920 --> 00:04:10.920] I don't know. [00:04:10.920 --> 00:04:11.920] I don't know. [00:04:11.920 --> 00:04:12.920] I don't know. [00:04:12.920 --> 00:04:13.920] I don't know. [00:04:13.920 --> 00:04:14.920] I don't know. [00:04:14.920 --> 00:04:15.920] I don't know. [00:04:15.920 --> 00:04:16.920] Hey, talk out. [00:04:16.920 --> 00:04:17.920] I don't know. [00:04:17.920 --> 00:04:18.920] I don't know. [00:04:18.920 --> 00:04:25.400] This is kind of the street ... [00:06:26.960 --> 00:06:27.960] Oh, I loved it. [00:06:27.960 --> 00:06:29.360] It was very different. [00:06:29.360 --> 00:06:30.360] Very different. [00:06:30.360 --> 00:06:31.360] Very different. [00:06:31.360 --> 00:06:33.600] Yeah, especially his fourth student. [00:06:33.600 --> 00:06:35.160] I told you. ...

And I tried to transcribe a file, not from stream.

I attached audio file here.

adam.webm

Abdullahmamunal commented 1 year ago

Not sure fined

abelbabel commented 1 year ago

similar to #719 and #612

gcr commented 1 year ago

It's definitely a problem with inference. Slicing the audio up and re-transcribing from the failed part works just fine.

ggerganov commented 1 year ago

Should be resolved via f19e23fbd108ec3ac458c7a19b31c930719e7a94

realcarlos commented 1 year ago

20230509-113110

I pulled the latest code ,still met this issue , I am using ggml-small.en.bin