ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.65k stars 3.63k forks source link

Write to file correctly when using `stream.exe` #261

Closed AB1908 closed 1 year ago

AB1908 commented 1 year ago

Hello folks, my use case is to redirect the output to a file while using stream.exe and be able to view it in an editor like VS Code, essentially live transcription that is storing the output correctly and can also be instantly visualized. I invoked it like so:

stream\stream.exe -m .\ggml-model-whisper-tiny.en.bin > ".\live.md"

on Powershell. Am I doing it incorrectly?

ggerganov commented 1 year ago

You can use the -f switch to output to a file. For example, on Unix I do the following:

./stream -t 8 -m models/ggml-${model}.bin -f /tmp/whisper.out 2> /dev/null

The file /tmp/whisper.out contains the realtime output. You can then in a separate terminal read the last line of the file to get the latest transcribed text:

tail -n 1 /tmp/whisper.out

Not sure how this translates to Powershell though.

AB1908 commented 1 year ago

I was using a precompiled release since I was a tad impatient but I'll run a few other tests when I can find the time. Meanwhile, I'll try using the equivalent of your tail idea. Thanks for the quick response!

AB1908 commented 1 year ago

I did some testing and tail can be hit or miss. Additionally, outputting to a file seems to store lines twice because of the way text generation appears to work. When using it on the shell, it seems to go back and edit the text to make a coherent transcription whereas in the file, it writes it back multiple times. Example to demonstrate:

 I have now gone back to...
 I have now gone back to a slower setting to the...
 improve the...
 improve the...
 transcription accuracy. [BLANK_AUDIO]
 transcription accuracy. So [BLANK_AUDIO]

vs

 I have now gone back to a slower setting to
 improve the transcription accuracy. So [BLANK_AUDIO]
ggerganov commented 1 year ago

You can try the new "Sliding window" mode in the stream - not sure if this is what you are looking for, but it might be useful.

mrmachine commented 1 year ago

When writing to file with the "sliding window" mode, the transcription block headers and timestamps are missing, which makes the block level offsets on each line useless.

WIth this command:

./stream --step 0 --model models/ggml-tiny.en.bin --file voice.txt

I get this stdout:

[Start speaking]
### Transcription 0 START | t0 = 48 ms | t1 = 8944 ms

[00:00.000 --> 00:08.000]   Okay, from now, thank you very much, after.

### Transcription 0 END

### Transcription 1 START | t0 = 21640 ms | t1 = 31640 ms

[00:00.000 --> 00:04.096]   been running a soft no campaign since the election, so the announcement of the official
[00:04.096 --> 00:09.088]   no wasn't exactly unexpected. Now not supporting the voices one thing.

### Transcription 1 END

### Transcription 2 START | t0 = 42582 ms | t1 = 52582 ms

[00:00.000 --> 00:04.092]   not all Indigenous people support a voice to Parliament or indeed the process.
[00:04.092 --> 00:09.036]   They are going to be plenty of views. This isn't about that though.

### Transcription 2 END

### Transcription 3 START | t0 = 46240 ms | t1 = 56240 ms

[00:00.000 --> 00:06.024]   or indeed the process, they're going to be plenty of views. This isn't about that though.
[00:06.024 --> 00:10.000]   Everyone is going to make up their own mind about the boys.

### Transcription 3 END
^C

And this logged to voice.txt:

[00:00.000 --> 00:08.000]   Okay, from now, thank you very much, after.

[00:00.000 --> 00:04.096]   been running a soft no campaign since the election, so the announcement of the official
[00:04.096 --> 00:09.088]   no wasn't exactly unexpected. Now not supporting the voices one thing.

[00:00.000 --> 00:04.092]   not all Indigenous people support a voice to Parliament or indeed the process.
[00:04.092 --> 00:09.036]   They are going to be plenty of views. This isn't about that though.

[00:00.000 --> 00:06.024]   or indeed the process, they're going to be plenty of views. This isn't about that though.
[00:06.024 --> 00:10.000]   Everyone is going to make up their own mind about the boys.

Is this expected?

ggerganov commented 1 year ago

Not expected. As a workaround, you can try to drop the -f flag and instead pipe the std output to a file and parse that

Will close this, as the original problem is resolved. Feel free to open a new issue if the workaround is not enough