ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.38k stars 3.61k forks source link

Hallucinations and repeats of previous transcriptions when running without reloading model #2445

Closed nchudleigh closed 3 weeks ago

nchudleigh commented 4 weeks ago

I'm running into an issue where subsequent runs of the model are bleeding over results from a previous recording when the model is staying loaded in memory.

I've checked all the inputs to the full transcribe call and there is no difference between the two calls, but it seems that something internal to whisper.cpp is not being reset.

Are there anything that I need to call within whisper.cpp to reset the state of the model?

Example results:

1st transcription (good result): Hello my name is Mark and this is a boat it floats on water and it's very slow.

2nd transcription (completely different audio does not have any of this transcribed text in it) Hello my name is Mark. Hello my name is Mark. Hello my name is Mark. Hello my name is Mark.

ggerganov commented 3 weeks ago

Can you confirm that the following patch fixes the issue:

diff --git a/src/whisper.cpp b/src/whisper.cpp
index 9c7c66b..c8ee0f8 100644
--- a/src/whisper.cpp
+++ b/src/whisper.cpp
@@ -1033,6 +1033,8 @@ static void whisper_kv_cache_clear(struct whisper_kv_cache & cache) {
         cache.cells[i].seq_id.clear();
     }
     cache.head = 0;
+
+    ggml_backend_buffer_clear(cache.buffer, 0);
 }

 static void whisper_kv_cache_seq_rm(
nchudleigh commented 3 weeks ago

Initial testing looks good, sending out to early release group as well.

ggerganov commented 3 weeks ago

I went ahead and pushed the patch to master. On one hand it's a bit strange that clearing the cache makes a difference at all since the KQ mask would already mask away the unused data from previous runs, so this makes me think that there might be some other issue at hand. Let me know if you continue to experience this problem.

nchudleigh commented 3 weeks ago

@ggerganov It appears to fix the leaks so far, but I will have more feedback from users in the next couple days.

I am also testing the new v3 turbo model on this release candidate, which seems to hallucinate (repetition) a bit. Are you interested in feedback on it? I can spin up a new issue if so.

ggerganov commented 3 weeks ago

I've mostly accepted that v3 models are busted, so I don't expect much from v3-turbo. Feedback is always appreciated though.

nchudleigh commented 3 weeks ago

@ggerganov I feel I might as well document it, on the off chance a solution can be found- the performance is otherwise incredible.

(Un)fortunately the hallucination is not consistent. It mostly manifests as repetition.