Closed iorilu closed 1 year ago
Hello. Yes, some of users reported this before. Which operating system are you using? Are you using CUDA or CPU? Thanks
I am using windows 10 , has 3060 GPU , and use miniconda as environment
GPU is working , just srt file not generated
one other issue is live_transcribe also not work as expected, no response after run live mode, it seems volume is too small, i changed Threshold = 0.01 , then program start to work , but recognized very poor result
same problem! i can see all the transcription results, but sometimes no output.
for me ,short files like under 10mins, and when using --model small(by default),all goes fine and show "Transcription results written to 'D:\test' directory", then srt files can be found in current folder. but when using --model tiny or other models, no output at all after transcripting complete, just stopped, no error tips.
And for files like 30mins or longer , no model that i tested can genrate srt files.
cuda toolkit:11.8, cudnn: 8.2, torch,'1.11.0+cu113' zlibwapi.dll and cudnn path proper set.. whisper-ctranslate2 version: up-to-date.
the transcripting is indeed a lot more faster, do hope this tiny error can be clear and fix:)
one other issue is live_transcribe also not work as expected, no response after run live mode, it seems volume is too small, i changed Threshold = 0.01 , then program start to work , but recognized very poor result
Can you try the live transcription providing the language? Since the audios are short, this usually improves a lot. Can you confirm is this makes any difference for you? Thanks
@seset
My current hypothesis is there is a problem with CTranslate2 and GPU on Windows due to CUDA or CTranslate2.
In order to validate or invalidate this hypothesis I will ask you if you can try to run this simple code:
And tell me if you can reproduce the problem with this simple code. Thanks
the code seems no problem with me
one other issue is live_transcribe also not work as expected, no response after run live mode, it seems volume is too small, i changed Threshold = 0.01 , then program start to work , but recognized very poor result
Can you try the live transcription providing the language? Since the audios are short, this usually improves a lot. Can you confirm is this makes any difference for you? Thanks
I just tried the latest version (0.22) , and providing the language english(en) and chinese(zh) , if use the default setting , Threshold = 0.2 , still can't record anything , I am not sure if it's the micphone issue , i just use laptop mic
i adjust Threshold=0.05, now the program has response, but the effect is very poor compare to normal transcribing a video, is this normal?
@seset
My current hypothesis is there is a problem with CTranslate2 and GPU on Windows due to CUDA or CTranslate2.
In order to validate or invalidate this hypothesis I will ask you if you can try to run this simple code:
- Download https://raw.githubusercontent.com/jordimas/calaix-de-sastre/master/faster-whisper/inference.py
- Edit the file and change "file.mp3" for your file
- Run in using python3 inference.py
And tell me if you can reproduce the problem with this simple code. Thanks
by using py and no output at all no matter which model to use.. also i tried faster-whisper directly in other venv and encourted same problem , can you please advise the excat cuda toolkit and cudnn version you use, as well as the torch,torchvison , i do want try to get this nice codes work first.
I just tried the latest version (0.22) , and providing the language english(en) and chinese(zh) , if use the default setting , Threshold = 0.2 , still can't record anything , I am not sure if it's the micphone issue , i just use laptop mic
i adjust Threshold=0.05, now the program has response, but the effect is very poor compare to normal transcribing a video, is this normal?
In the next version it will be possible to adjust the volume threshold https://github.com/Softcatala/whisper-ctranslate2/commit/0023a2eea6de76c0d6bf646266dfd4d7371e7155. You will not need to edit the code.
Have you tried specifying the language and using the same model that you use for file transcription?
yes,i specified language en and zh, no improvement
Version 0.2.6 should fix this.
I tried a few files, sometimes srt file not generated, I used --output_format srt
I also try to debug it , and set breakpoints at following lines , find it may never reach here , and no error or exception, it's strange.