Softcatala / whisper-ctranslate2

Whisper command line client compatible with original OpenAI client based on CTranslate2.
MIT License
859 stars 74 forks source link

sometimes srt file not generated #29

Closed iorilu closed 1 year ago

iorilu commented 1 year ago

I tried a few files, sometimes srt file not generated, I used --output_format srt

I also try to debug it , and set breakpoints at following lines , find it may never reach here , and no error or exception, it's strange.

            writer = get_writer(output_format, output_dir)
            writer(result, audio_path)
jordimas commented 1 year ago

Hello. Yes, some of users reported this before. Which operating system are you using? Are you using CUDA or CPU? Thanks

iorilu commented 1 year ago

I am using windows 10 , has 3060 GPU , and use miniconda as environment

GPU is working , just srt file not generated

one other issue is live_transcribe also not work as expected, no response after run live mode, it seems volume is too small, i changed Threshold = 0.01 , then program start to work , but recognized very poor result

seset commented 1 year ago

same problem! i can see all the transcription results, but sometimes no output.

for me ,short files like under 10mins, and when using --model small(by default),all goes fine and show "Transcription results written to 'D:\test' directory", then srt files can be found in current folder. but when using --model tiny or other models, no output at all after transcripting complete, just stopped, no error tips.

And for files like 30mins or longer , no model that i tested can genrate srt files.

cuda toolkit:11.8, cudnn: 8.2, torch,'1.11.0+cu113' zlibwapi.dll and cudnn path proper set.. whisper-ctranslate2 version: up-to-date.

the transcripting is indeed a lot more faster, do hope this tiny error can be clear and fix:)

jordimas commented 1 year ago

one other issue is live_transcribe also not work as expected, no response after run live mode, it seems volume is too small, i changed Threshold = 0.01 , then program start to work , but recognized very poor result

Can you try the live transcription providing the language? Since the audios are short, this usually improves a lot. Can you confirm is this makes any difference for you? Thanks

jordimas commented 1 year ago

@seset

My current hypothesis is there is a problem with CTranslate2 and GPU on Windows due to CUDA or CTranslate2.

In order to validate or invalidate this hypothesis I will ask you if you can try to run this simple code:

  1. Download https://raw.githubusercontent.com/jordimas/calaix-de-sastre/master/faster-whisper/inference.py
  2. Edit the file and change "file.mp3" for your file
  3. Run in using python3 inference.py

And tell me if you can reproduce the problem with this simple code. Thanks

iorilu commented 1 year ago

the code seems no problem with me

Snipaste_2023-04-28_15-21-39
iorilu commented 1 year ago

one other issue is live_transcribe also not work as expected, no response after run live mode, it seems volume is too small, i changed Threshold = 0.01 , then program start to work , but recognized very poor result

Can you try the live transcription providing the language? Since the audios are short, this usually improves a lot. Can you confirm is this makes any difference for you? Thanks

I just tried the latest version (0.22) , and providing the language english(en) and chinese(zh) , if use the default setting , Threshold = 0.2 , still can't record anything , I am not sure if it's the micphone issue , i just use laptop mic

i adjust Threshold=0.05, now the program has response, but the effect is very poor compare to normal transcribing a video, is this normal?

seset commented 1 year ago

@seset

My current hypothesis is there is a problem with CTranslate2 and GPU on Windows due to CUDA or CTranslate2.

In order to validate or invalidate this hypothesis I will ask you if you can try to run this simple code:

  1. Download https://raw.githubusercontent.com/jordimas/calaix-de-sastre/master/faster-whisper/inference.py
  2. Edit the file and change "file.mp3" for your file
  3. Run in using python3 inference.py

And tell me if you can reproduce the problem with this simple code. Thanks

by using py and no output at all no matter which model to use.. also i tried faster-whisper directly in other venv and encourted same problem , can you please advise the excat cuda toolkit and cudnn version you use, as well as the torch,torchvison , i do want try to get this nice codes work first.

jordimas commented 1 year ago

I just tried the latest version (0.22) , and providing the language english(en) and chinese(zh) , if use the default setting , Threshold = 0.2 , still can't record anything , I am not sure if it's the micphone issue , i just use laptop mic

i adjust Threshold=0.05, now the program has response, but the effect is very poor compare to normal transcribing a video, is this normal?

In the next version it will be possible to adjust the volume threshold https://github.com/Softcatala/whisper-ctranslate2/commit/0023a2eea6de76c0d6bf646266dfd4d7371e7155. You will not need to edit the code.

Have you tried specifying the language and using the same model that you use for file transcription?

iorilu commented 1 year ago

yes,i specified language en and zh, no improvement

jordimas commented 1 year ago

Version 0.2.6 should fix this.