Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Mozilla Public License 2.0
8.45k stars 723 forks source link

A little advice #43

Closed martjay closed 1 year ago

martjay commented 1 year ago

I would like to add the ability to implement batch file tasks. Also if real time speech recognition is implemented with low latency, can we do a desktop captioning? That way we can watch videos with real time translation.

Const-me commented 1 year ago

@martjay Currently, there’re two ways to implement batching.

  1. Use the command-line application main.exe from the cli.zip archive, it can accept many input files. Will transcribe them one by one.

  2. Write a custom application, and consume the DLL. I think the C# API is easier to use compared to the C++ COM API, use WhisperNet nuget package. Note the package requires modern version of .NET, 6.0 or newer. It’s not compatible with the legacy .NET framework 4.

About the desktop, look for third-party software which can create a fake virtual microphone, to capture audio output. However, the latency is not great in the current version, it’s several seconds.

martjay commented 1 year ago

@martjay Currently, there’re two ways to implement batching.

  1. Use the command-line application main.exe from the cli.zip archive, it can accept many input files. Will transcribe them one by one.
  2. Write a custom application, and consume the DLL. I think the C# API is easier to use compared to the C++ COM API, use WhisperNet nuget package. Note the package requires modern version of .NET, 6.0 or newer. It’s not compatible with the legacy .NET framework 4.

About the desktop, look for third-party software which can create a fake virtual microphone, to capture audio output. However, the latency is not great in the current version, it’s several seconds.

main.exe flashback after running, I can not program, sigh~

Const-me commented 1 year ago

@martjay main.exe is a console application. Press Win+R, type cmd, press Enter. Use cd command to navigate to the directory where you have the main.exe First run main.exe -h, it will print the list of supported command-line parameters with short description. Then run main.exe once more, this time specify the model, one or more input audio files, and some optional parameters. Example:

main.exe -m D:\Data\Whisper\ggml-medium.bin -otxt -nc -nt C:\Z\Fun\OpenAI\Whisper\SampleClips\jfk.wav

Attaching a screenshot. cli-example-usage

martjay commented 1 year ago

@martjay main.exe is a console application. Press Win+R, type cmd, press Enter. Use cd command to navigate to the directory where you have the main.exe First run main.exe -h, it will print the list of supported command-line parameters with short description. Then run main.exe once more, this time specify the model, one or more input audio files, and some optional parameters. Example:

main.exe -m D:\Data\Whisper\ggml-medium.bin -otxt -nc -nt C:\Z\Fun\OpenAI\Whisper\SampleClips\jfk.wav

Attaching a screenshot. cli-example-usage

'main.exe' is not an internal or external command, nor is it a runnable program or batch file.

Const-me commented 1 year ago

@martjay You should download that program from Releases page of this repository, unpack cli.zip somewhere, then use the cd command to navigate to the folder which contains the unpacked main.exe

martjay commented 1 year ago

@martjay You should download that program from Releases page of this repository, unpack cli.zip somewhere, then use the cd command to navigate to the folder which contains the unpacked main.exe

I know, maybe there is something wrong with my computer, maybe it's a problem with the system environment variables

Const-me commented 1 year ago

@martjay Are you running a 64-bit version of Windows 10 or 11? Press Win+pause/break key, you should see a window with the text “System Type: 64-bit operating system”

If that’s correct, this means you are in the wrong directory in the cmd.exe shell. You can use dir command to list files in the current directory, you should see main.exe and Whisper.dll files in that directory.

martjay commented 1 year ago

@martjay Are you running a 64-bit version of Windows 10 or 11? Press Win+pause/break key, you should see a window with the text “System Type: 64-bit operating system”

If that’s correct, this means you are in the wrong directory in the cmd.exe shell. You can use dir command to list files in the current directory, you should see main.exe and Whisper.dll files in that directory.

Powershell:

PS D:\Downloads\SOFTWARE\字幕软件\WhisperDesktop> main.exe - h main.exe : 无法将“main.exe”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请确保 路径正确,然后再试一次。 所在位置 行:1 字符: 1

Suggestion [3,General]: 找不到命令 main.exe,但它确实存在于当前位置。默认情况下,Windows PowerShell 不会从当前位置加载命令。如果信任此命令,请改为键入“.\main.exe”。有关详细信息,请参阅 "get-help about_Command_Precedence"。 PS D:\Downloads\SOFTWARE\字幕软件\WhisperDesktop>


Git bash:

Enya@DESKTOP-NIM9K91 MINGW64 /d/Downloads/SOFTWARE/字幕软件/WhisperDesktop $ dir Binary Whisper.dll main.exe output Include WhisperDesktop.exe model Library.zip WhisperDesktop.rar opt_01_vorproduktion.avi Linker cli.zip opt_01_vorproduktion.wav

Enya@DESKTOP-NIM9K91 MINGW64 /d/Downloads/SOFTWARE/字幕软件/WhisperDesktop $ main.exe -h bash: main.exe: command not found

Enya@DESKTOP-NIM9K91 MINGW64 /d/Downloads/SOFTWARE/字幕软件/WhisperDesktop $


I am using Windows 11 PRO X64

Const-me commented 1 year ago

@martjay If you insist on using PowerShell instead of cmd.exe, use ./main.exe instead of main.exe. See the screenshot. ps-example-usage

martjay commented 1 year ago

@martjay If you insist on using PowerShell instead of cmd.exe, use ./main.exe instead of main.exe. See the screenshot. ps-example-usage

list worked, but

main.exe : 无法将“main.exe”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写,如果包括路径,请确保 路径正确,然后再试一次。 所在位置 行:1 字符: 1

Suggestion [3,General]: 找不到命令 main.exe,但它确实存在于当前位置。默认情况下,Windows PowerShell 不会从当前位置加载命令。如果信任此命令,请改为键入“.\main.exe”。有关详细信息,请参阅 "get-help about_Command_Precedence"。 PS D:\Downloads\SOFTWARE\字幕软件\WhisperDesktop>

---- maybe it's a problem with the system environment variables

martjay commented 1 year ago

Now that it works, one more question, what symbol is used to separate multiple files?

0000

martjay commented 1 year ago

Another headache, it sometimes has parts that are not recognized and then there is a big repetition 0000

martjay commented 1 year ago

I use the command line to do batch tasks and find it too cumbersome, I would still hope you to implement this in the GUI, it would be very convenient for everyone, thank you, if you can.

Const-me commented 1 year ago

@martjay That GUI also going to be very cumbersome. And completely different from the current WhisperDesktop. Ideally, need a wizard-like GUI when user populates a long list of input files by importing files or folders, then reviews the list and assigns unique output paths.

You can try the new PowerShell wrapper I’ve created in version 1.10. Might be easier to use for your use case, compared to the main.exe CLI.

tigros commented 1 year ago

ok i'm going to mention this again, might seem like double posting but see https://github.com/tigros/Whisperer. Oh and thank you Kosta, great coding!

martjay commented 1 year ago

ok i'm going to mention this again, might seem like double posting but see https://github.com/tigros/Whisperer. Oh and thank you Kosta, great coding!

What encoding do you use for your subtitles? I'm getting a mess when I choose to output Chinese subtitles. Also, why doesn't it automatically delete those .wav files after it finishes extracting the subtitles?

111111

Selecting English subtitles is the same mess. But when I put the multimedia file into a path without Chinese, the subtitles are recognized properly without messing up, so I hope you can fix this. Another problem is that it doesn't delete .wav files automatically, and another problem is that it generates .wav files in the export directory whether I import .wav or not.

tigros commented 1 year ago

ok i'm on it, i have an idea.

martjay commented 1 year ago

ok i'm on it, i have an idea.

Hope you can add this feature too! Thank you man~

111111

tigros commented 1 year ago

ok fixed unicode problem and delete wavs, but place file in same folder, will consider it.

tigros commented 1 year ago

i figured an easy way to do it, so same folder option is there now. thanks for suggestions!

martjay commented 1 year ago

i figured an easy way to do it, so same folder option is there now. thanks for suggestions!

Good job! Man, you are my hero. This software is very significant and will help many people to complete their studies. There are no words to describe how grateful I am to you all, you have done a very worthwhile job. Thank you again Const-me and Tigros!

martjay commented 1 year ago

ok fixed unicode problem and delete wavs, but place file in same folder, will consider it.

I also have a bold idea: to implement dual language subtitles, with machine translated subtitles on top and subtitles in the source language on the bottom. This would eliminate misunderstandings caused by inaccurate machine translations. Possible process: generate machine translated subtitles and source language subtitles, possibly taking twice as long to recognise, and then synthesise a bilingual subtitle through some program.

martjay commented 1 year ago

i figured an easy way to do it, so same folder option is there now. thanks for suggestions!

There are two bugs that need to be fixed.

  1. inability to remember model paths and output folder paths, I thought it might be possible to get it to automatically write the path into the config file.
  2. If I use an English video and select Chinese as the recognition language, theoretically all Chinese should be displayed, but I have found a strange problem where the first few sentences of the video are still subtitled in English, It seems that English subtitles are also interspersed in the middle of the video. As if all videos are the same. 111111 222222
tigros commented 1 year ago

it remembers paths now, v2.2. thanks.

martjay commented 1 year ago

it remembers paths now, v2.2. thanks.

Thank you man