V-Sekai / godot-whisper

An GDExtension addon for the Godot Engine that enables realtime audio transcription, supports OpenCL for most platforms, Metal for Apple devices, and runs on a separate thread.
MIT License
50 stars 5 forks source link

Integration Design of Whisper GGML Data Model #8

Closed fire closed 7 months ago

fire commented 7 months ago

Current Status

As per the recent discussion, it seems that the symbols are now being exported correctly. However, there is still uncertainty about the functionality and implementation of certain components.

The missing part in the current implementation is the mono? stero? audio data stream input. This needs to be sent to whisper.cpp. After wrapping the core algorithm, resampling is required to the desired format. The audio effect can then be attached to a microphone or speech recording to output text. In the proposed design, audio effect should have an accessor to the whisper ggml ml data model, as a gguf resource.

Please note that due to personal circumstances, I will be away this weekend.

fire commented 7 months ago

Note that the stats are previously from a voip system and aren't fully relevant. https://github.com/V-Sekai/v-sekai.whisper/blob/main/src/speech_processor.cpp#L335-L360

Previously there was a audio effect to speech processor to network to audio output.

The current design could be audio effect to speech processor where instead packaging the audio for transmission we transfer it into the whisper.cpp model and get text.

fire commented 7 months ago

The part that has the whisper.cpp print output is here https://github.com/V-Sekai/v-sekai.whisper/blob/main/src/speech.cpp#L90-L95

There a lot of unused code, probably need to cleanup.