V-Sekai / godot-whisper

An GDExtension addon for the Godot Engine that enables realtime audio transcription, supports OpenCL for most platforms, Metal for Apple devices, and runs on a separate thread.
MIT License
48 stars 5 forks source link

Use thread #31

Closed aiaimimi0920 closed 6 months ago

aiaimimi0920 commented 6 months ago

@fire First of all, thank you for the help you provided me with the Windows build issue #29 @Ughuuu In my specific use, I found that because calling the whisker method on the main thread, the main thread would freeze regularly, resulting in an unresponsive state of the application

So I made some changes and I hope my changes can be helpful to you. Feel free to make any changes you deem appropriate.

code based on https://github.com/tableos/mina/blob/main/native/stt_whisper.cc#L109

https://github.com/V-Sekai/godot-whisper/assets/153103332/a1fed48f-59a7-4ae9-9da6-6c0734a14b35

The main modifications include the following points:

  1. Starting the whisper from a thread can prevent the main thread from freezing
  2. Register "speech_to_text" as a singleton: I don't think there may be a situation where two "speech_to_text" nodes are needed at the same time
  3. ResourceFormatLoaderWhisper just load file path, not load model content : I think it's enough to obtain model content through file path when needed, and there's no need to store the content in WhisperResource
  4. Reload the model when calling "set_use_gpu"
  5. Modified the transcribe logic and now uses "add_audio_buffer" method adds audio data through "update_transcribed_msgs" signal obtains the parsed text content : The main reason for implementing this is that the current logic of calling transfer in 5 seconds does not meet my needs. I hope to update the content in real time, so I use the method of saving historical audio data and discarding it when it can be parsed into a complete sentence segment

Thank you again for this project

Note that I only tested on the Windows platform

Ughuuu commented 6 months ago

Interesting changes. Do you want to continue the PR and try to merge it? I could leave comments if you want. If not I will try to change it and see if I can merge it.

aiaimimi0920 commented 6 months ago

If you have any suggestions, I would be happy to modify this PR

Ughuuu commented 6 months ago

Added. Basically main thing was a lot of leftover code/comments that aren't removed. Also making the script you wrote be reusable for others(my idea would be that people don't use the singleton directly, but use a higher level node/script(in this case the mic test.gd you wrote, name it something else how I named it before, and don't append events to it directly but to another node, that way it can be reused.

Ughuuu commented 6 months ago

But thanks for this, it's much faster with the optimisations you did, and also the high pass/filter you did, quite impressive.

aiaimimi0920 commented 6 months ago

i need some help @Ughuuu #32

fire commented 6 months ago

Merged. Thanks!