V-Sekai / godot-whisper

An GDExtension addon for the Godot Engine that enables realtime audio transcription, supports OpenCL for most platforms, Metal for Apple devices, and runs on a separate thread.
MIT License
69 stars 7 forks source link

.wav file issues #74

Closed JBlank19 closed 3 months ago

JBlank19 commented 5 months ago

Good day!

The audio transcription node does not work with most of the .wav files. For example the .wav files produced from godot itself when recording the mic. However, it does work with the capture node. It seems some issue with the formatting of the input data.

fire commented 5 months ago

As far as I know we wrote the capture node because we weren't able to get the record node to work 3-4 years ago.

AllenDang commented 5 months ago

@fire My god, I just wasted like half a day trying to figure out why the recorded wav doesn't work...

fire commented 5 months ago

@AllenDang Here was the original design documentation. https://github.com/godotengine/godot-proposals/issues/2013

Ughuuu commented 4 months ago

The .wav will not work since the API receives the direct sound buffer data after/if you would decode the .wav: Array transcribe(PackedFloat32Array buffer, String initial_prompt, int audio_ctx);

If you just read the .wav file, I think that would have some extra encoding data.

Ughuuu commented 3 months ago

Will update readme to say that .wav doesn't work and that transcribe currently only works with a float32 buffer. For wav you have to decode it yourself.