VinAIResearch / PhoWhisper

PhoWhisper: Automatic Speech Recognition for Vietnamese (2024)
BSD 3-Clause "New" or "Revised" License
113 stars 10 forks source link

C++ Version #1

Closed vietanhdev closed 2 months ago

vietanhdev commented 9 months ago

Hello, Thank you for sharing this project with the public. It's great to see a project like this supporting Vietnamese language. I'm wondering if there are any plans to convert this model to C++, similar to the Whisper.CPP project (https://github.com/ggerganov/whisper.cpp). Having a C++ implementation would allow us to utilize this model on low-power PCs or mobile devices. I'm also curious if this model addresses the issues that were encountered with the Whisper model from OpenAI when working with Vietnamese. The related URLs for the issues are:

datquocnguyen commented 9 months ago

Could you or someone else give it a try and observe the outcome? Your input would be greatly appreciated.

duykhanhbk commented 8 months ago

@vietanhdev I really tried convert phoWhisper model (base, large version) to CT2 (C++ like faster whisper) and it works well, I'll try with Whisper.cpp later (I really like llama.cpp) and I see the problem with phoWhisper when I try inference by chunked size, the transcription output really long (not correspond with chunk size) compare with Whisper-large v3 @datquocnguyen (I think this is because of the dataset use for training finetuning, is this correct?)

vietanhdev commented 8 months ago

Could your contribute your instruction / conversion code to this repo? It would be very helpful. Thank you! 😁

chiendo97 commented 3 months ago

You guys can find a lot of CT2 version of PhoWhisper models in hugging face here.

bvqbao commented 3 months ago

Do you guys succeed in converting PhoWhisper to gguf? I tried it with the provided script in whisper.cpp (convert-h5-to-ggml.py) but the output of the gguf model is garbage. There's something wrong here.

ILG2021 commented 3 months ago

I have used ctranslate2 for a long time, I suggest to use ctranslate2, the speed is faster than whisper cpp, and it is easy to use in python. PhoWhisper has been finetuned with some accurate dataset, so the hallucinations may disappear. I want to suggest that PhoWhisper can has a large v3 version, so that it will go to more accurate.

chiendo97 commented 3 months ago

Do you guys succeed in converting PhoWhisper to gguf? I tried it with the provided script in whisper.cpp (convert-h5-to-ggml.py) but the output of the gguf model is garbage. There's something wrong here.

@bvqbao You can try to add -nt option when using whisper.cpp. The hallucination will disappear.