Closed nchudleigh closed 3 weeks ago
Can you confirm that the following patch fixes the issue:
diff --git a/src/whisper.cpp b/src/whisper.cpp
index 9c7c66b..c8ee0f8 100644
--- a/src/whisper.cpp
+++ b/src/whisper.cpp
@@ -1033,6 +1033,8 @@ static void whisper_kv_cache_clear(struct whisper_kv_cache & cache) {
cache.cells[i].seq_id.clear();
}
cache.head = 0;
+
+ ggml_backend_buffer_clear(cache.buffer, 0);
}
static void whisper_kv_cache_seq_rm(
Initial testing looks good, sending out to early release group as well.
I went ahead and pushed the patch to master
. On one hand it's a bit strange that clearing the cache makes a difference at all since the KQ mask would already mask away the unused data from previous runs, so this makes me think that there might be some other issue at hand. Let me know if you continue to experience this problem.
@ggerganov It appears to fix the leaks so far, but I will have more feedback from users in the next couple days.
I am also testing the new v3 turbo model on this release candidate, which seems to hallucinate (repetition) a bit. Are you interested in feedback on it? I can spin up a new issue if so.
I've mostly accepted that v3 models are busted, so I don't expect much from v3-turbo. Feedback is always appreciated though.
@ggerganov I feel I might as well document it, on the off chance a solution can be found- the performance is otherwise incredible.
(Un)fortunately the hallucination is not consistent. It mostly manifests as repetition.
I'm running into an issue where subsequent runs of the model are bleeding over results from a previous recording when the model is staying loaded in memory.
I've checked all the inputs to the full transcribe call and there is no difference between the two calls, but it seems that something internal to whisper.cpp is not being reset.
Are there anything that I need to call within whisper.cpp to reset the state of the model?
Example results:
1st transcription (good result): Hello my name is Mark and this is a boat it floats on water and it's very slow.
2nd transcription (completely different audio does not have any of this transcribed text in it) Hello my name is Mark. Hello my name is Mark. Hello my name is Mark. Hello my name is Mark.