argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 330 forks source link

Index out of range error in TextDecoder #63

Closed cgfarmer4 closed 7 months ago

cgfarmer4 commented 8 months ago

Occasionally Im seeing an index out of range crash on the segmentLogProbs[index] after a long period of silence. https://github.com/argmaxinc/WhisperKit/blob/main/Sources/WhisperKit/Core/TextDecoder.swift#L518-L521

Swift/ContiguousArrayBuffer.swift:600: Fatal error: Index out of range

Two ways I could see guarding against this:

  1. Use swift zip
  2. Check the index against segmentLogProbs count.
for (token, logProb) in zip(segmentTokens, segmentLogProbs) {
    tokenProbs.append([token: logProb])
}

for (index, token) in segmentTokens.enumerated() {
  if index < segmentLogProbs.count {
      tokenProbs.append([token: segmentLogProbs[index]])
  }
}

Happy to PR either one but unsure if Im missing a reason for this being as is.

atiorh commented 8 months ago

Thanks @cgfarmer4, are you able to share an audio file that reproduces this? It sounds like the root cause could be a bug and we might need to dig deeper.

cgfarmer4 commented 8 months ago

I unfortunately dont have an audio file as the crash has been happening while streaming. I can record the audio in my app so Ill do that for a bit when testing to see if I can reproduce this again.

cgfarmer4 commented 8 months ago
image

Stack trace. My AudioModifier class has equivalent to the functions to the ones used in the ContentView.

ZachNagengast commented 8 months ago

@cgfarmer4 Are you able to check the sizes of the variables when there is a crash? I suspect segmentTokens.count is 0 when there is silence, but segmentLogProbs is always initialized with at least 1 value. Will look into this further, thanks for the report.

ZachNagengast commented 8 months ago

Pushed a change to fix this now, please confirm if it resolves the crashes 🙏

cgfarmer4 commented 8 months ago

Appears resolved on my side. Thanks for the quick fix!

cgfarmer4 commented 7 months ago

Been able to reproduce this twice in the last week on latest main. Both times while using streaming and the new distil models but im not sure thats related.

Screenshot 2024-03-29 at 8 47 46 PM Screenshot 2024-03-26 at 8 21 00 PM
ZachNagengast commented 7 months ago

@cgfarmer4 In the latest release https://github.com/argmaxinc/WhisperKit/releases/tag/v0.5.0 there is some new logic that should keep the tokens and log probs from getting out of sync anymore, if you see the issue again on main please let us know. https://github.com/argmaxinc/WhisperKit/blob/b2fd48d75bc0595ec2e4b9c99d76b9a276e8dd02/Sources/WhisperKit/Core/TextDecoder.swift#L583-L585

cgfarmer4 commented 7 months ago

Awesome @ZachNagengast! Ill close this again and reopen with any new reports. Cheers