Closed fire closed 7 months ago
Note that the stats are previously from a voip system and aren't fully relevant. https://github.com/V-Sekai/v-sekai.whisper/blob/main/src/speech_processor.cpp#L335-L360
Previously there was a audio effect to speech processor to network to audio output.
The current design could be audio effect to speech processor where instead packaging the audio for transmission we transfer it into the whisper.cpp model and get text.
The part that has the whisper.cpp print output is here https://github.com/V-Sekai/v-sekai.whisper/blob/main/src/speech.cpp#L90-L95
There a lot of unused code, probably need to cleanup.
Current Status
As per the recent discussion, it seems that the symbols are now being exported correctly. However, there is still uncertainty about the functionality and implementation of certain components.
The missing part in the current implementation is the mono? stero? audio data stream input. This needs to be sent to
whisper.cpp
. After wrapping the core algorithm, resampling is required to the desired format. The audio effect can then be attached to a microphone or speech recording to output text. In the proposed design, audio effect should have an accessor to the whisper ggml ml data model, as a gguf resource.Please note that due to personal circumstances, I will be away this weekend.