Macoron / whisper.unity

Running speech to text model (whisper.cpp) in Unity3d on your local machine.
MIT License
387 stars 82 forks source link

libwhisper_cuda.dll get_segment issue #89

Open doyoon-k opened 1 month ago

doyoon-k commented 1 month ago

Hi thank you for this wonderful tool! I was trying to use this wrapper for my project and it works very well with regular version but when I turn on the CUDA in the preference setting, it always throw this error no matter what I say in the mic. I can't tell if this is a bug from the DLL or I just did the setting wrong so I am leaving a question here.

OverflowException: TimeSpan overflowed because the duration is too long.
System.TimeSpan.Interval (System.Double value, System.Int32 scale) (at <467a840a914a47078e4ae9b0b1e8779e>:0)
System.TimeSpan.FromMilliseconds (System.Double value) (at <467a840a914a47078e4ae9b0b1e8779e>:0)
Whisper.WhisperSegment..ctor (System.Int32 index, System.String text, System.UInt64 start, System.UInt64 end) (at Assets/com.whisper.unity/Runtime/WhisperResult.cs:67)
Whisper.WhisperWrapper.GetSegment (System.Int32 i, Whisper.WhisperParams param) (at Assets/com.whisper.unity/Runtime/WhisperWrapper.cs:233)
Whisper.WhisperWrapper.NewSegmentCallback (System.Int32 nNew, Whisper.WhisperParams param) (at Assets/com.whisper.unity/Runtime/WhisperWrapper.cs:208)
Whisper.WhisperWrapper.NewSegmentCallbackStatic (System.IntPtr ctx, System.IntPtr state, System.Int32 nNew, System.IntPtr userDataPtr) (at Assets/com.whisper.unity/Runtime/WhisperWrapper.cs:198)
(wrapper native-to-managed) Whisper.WhisperWrapper.NewSegmentCallbackStatic(intptr,intptr,int,intptr)
Whisper.WhisperWrapper.InferenceWhisper (System.Single[] samples, Whisper.Native.WhisperNativeParams param) (at Assets/com.whisper.unity/Runtime/WhisperWrapper.cs:181)
Whisper.WhisperWrapper.GetText (System.Single[] samples, System.Int32 frequency, System.Int32 channels, Whisper.WhisperParams param) (at Assets/com.whisper.unity/Runtime/WhisperWrapper.cs:136)
Whisper.WhisperWrapper+<>c__DisplayClass17_0.<GetTextAsync>b__0 () (at Assets/com.whisper.unity/Runtime/WhisperWrapper.cs:169)
System.Threading.Tasks.Task`1[TResult].InnerInvoke () (at <467a840a914a47078e4ae9b0b1e8779e>:0)
System.Threading.Tasks.Task.Execute () (at <467a840a914a47078e4ae9b0b1e8779e>:0)
--- End of stack trace from previous location where exception was thrown ---
Whisper.WhisperWrapper.GetTextAsync (System.Single[] samples, System.Int32 frequency, System.Int32 channels, Whisper.WhisperParams param) (at Assets/com.whisper.unity/Runtime/WhisperWrapper.cs:170)
Whisper.WhisperManager.GetTextAsync (System.Single[] samples, System.Int32 frequency, System.Int32 channels) (at Assets/com.whisper.unity/Runtime/WhisperManager.cs:242)
ChatExample.OnRecordStop (Whisper.Utils.AudioChunk recordedAudio) (at Assets/Scripts/LLMVoiceChat.cs:127)
System.Runtime.CompilerServices.AsyncMethodBuilderCore+<>c.<ThrowAsync>b__7_0 (System.Object state) (at <467a840a914a47078e4ae9b0b1e8779e>:0)
UnityEngine.UnitySynchronizationContext+WorkRequest.Invoke () (at <55fbbbd17b724c15b6abe8c1a3e3289c>:0)
UnityEngine.UnitySynchronizationContext.Exec () (at <55fbbbd17b724c15b6abe8c1a3e3289c>:0)
UnityEngine.UnitySynchronizationContext.ExecuteTasks () (at <55fbbbd17b724c15b6abe8c1a3e3289c>:0)

All I could figure out was in WhisperWrapper.cs:231, "start" value gets a weird number from the dll function
image so an overflow happens in WhisperResult.cs:67 image

These are the settings for Whisper Manager and Microphone record image image

System info: image

Visual studio info: Microsoft Visual Studio Community 2022 Version 17.9.7 VisualStudio.17.Release/17.9.7+34902.65 Microsoft .NET Framework Version 4.8.09032

Installed Version: Community

Visual C++ 2022 00482-90000-00000-AA399 Microsoft Visual C++ 2022

NVIDIA CUDA 12.2 Wizards 12.2

NVIDIA Nsight Visual Studio Edition 2023.2.0.23143 NVIDIA Nsight Visual Studio Edition - CUDA support 2023.2.0.23143

Visual Studio Tools for Unity 17.9.2.0 Visual Studio Tools for Unity

Macoron commented 1 month ago

Thanks for reporting this bug.

I reproduced this error locally. It looks like this error happens only with enabled CUDA and quantized model. The ggml-small model works fine, but ggml-small-q5_1 fails to transcript and throws overflow exception.

I'm not sure what causing this error. Both start timestamp and text seems to be corrupted, while the end timestamp isn't. This could indicate some error in C# bindings, but the code is identical to CPU version. I also tested original whisper.cpp example and it works fine with CUDA and quantized model.

doyoon-k commented 1 month ago

That is weird hmm. Guess I'll go with non quantized model for now in that case. Thanks for checking!