Open smoores-dev opened 1 month ago
The lines possibly related are:
if (tokenIndex === 0 && tokenObject.text === '[_BEG_]' && tokenObject.offsets.from === 0) {
currentCorrectionTimeOffset = segmentObject.offsets.from / 1000
}
and
startTime = tokenObject.offsets.from / 1000
endTime = tokenObject.offsets.to / 1000
The code makes the assumption that offsets.from
and offsets.to
are always available.
Anyway, the whisper.cpp
build used by default has become slightly outdated now (early April 2024). Can you try with a newer whisper.cpp
build (v1.6.0
seems to be the latest published with actual binaries) to see if the problem was maybe fixed since then?
You can set a custom main
executable with whisperCpp.executablePath
.
If that doesn't help, I'll see how I can workaround the issue to prevent the error.
Yeah, this is unfortunately happening even when building directly from HEAD on the master branch of the whisper.cpp repo! I just ran the whisper.cpp command with the same flags as echogarden and found the problem token; at the end of the first very long string of "I love you"s, the last "you" token looks like this:
{
"text": " you",
"id": 291,
"p": 0.960787,
"t_dtw": -1
}
It has neither timestamps nor offsets!
Thanks a lot for the investigation.
I guess the issue can be reported on the whisper.cpp
repository, if it hasn't already.
For now, I can work around the issue by filling in missing timestamps based on neighboring timestamps.
I'm not doing development of this package at this general time (busy with other things), so I can't really predict exactly when the workaround would be published (maybe a few weeks, I don't know).
Yeah I'll open an issue against whisper.cpp as well; hopefully they'll fix it on their end! Thanks for taking a look
Would it be easier if I were to open a PR that attempted to work around this as you described, by looking at the timestamps/offsets of the surrounding tokens? I know that PR review can also be quite a bit of work, so no worries if you'd rather handle it yourself! I was just reminded of the monstrous number of open issues against the whisper.cpp repo haha
I don't think I need or want pull requests (so far I've closed the two that I got). This has been a personal project of mine. Maybe I'd prefer to keep the code 100% my own for now.
Even if I get the code, I can't guarantee when it is going to be published since I have other partially committed code destined for the next release.
Also, testing it works correctly may take more time than actually writing the code.
So, no need for pull request. I can try to quickly write and test a workaround locally, but it's not likely to be published during the next week (or possibly a bit more than).
Understood, sounds good!
When recognizing some border-line pathological audio content, apparently Whisper.cpp sometimes will output tokens without offset properties, resulting in the following error:
Here's the audio asset in question:
https://github.com/user-attachments/assets/68a6bac2-6461-4787-8943-821f6c5d0311
It's a TTS narration of a passage that includes, at a few points, the phrase "I love you" several dozen times in a row.