Closed ras0k closed 1 year ago
How faster-whisper speed [on GPU] compares to whisper-ConstMe?
I asked about whisper-ConstMe, not "openai/whisper". Btw, I find large model's timestamps way less accurate than medium, when transcription is not better.
whisper-ConstMe can use any model.
full benchmarks :
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.
This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.
Benchmark For reference, here's the time and memory usage that are required to transcribe 13 minutes of audio using different implementations:
openai/whisper@6dea21fd whisper.cpp@3b010f9 faster-whisper@cce6b53e
Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S.
Executed with 8 threads on a Intel(R) Xeon(R) Gold 6226R.
how much faster is constme compared to openai/whisper ?
If it ran for me I wouldn't ask. Why you are posting these pointless posts?
Why you are posting these pointless posts?
because this would help the software ? why would we not integrate faster-whisper ? why are you so adversarial to contribution on an open-source project ?
because this would help the software ? why would we not integrate faster-whisper ? why are you so adversarial to contribution on an open-source project ?
I asked a question, you answer with some irrelevant posts. I'm adversarial to nonsense...
I asked a question, you answer with some irrelevant posts. I'm adversarial to nonsense...
I think const-me/whisper is a Windows port of the whisper.cpp implementation. Which in turn is a C++ port of OpenAI's Whisper automatic speech recognition (ASR) model.
I think const-me/whisper is a Windows port of the whisper.cpp implementation. Which in turn is a C++ port of OpenAI's Whisper automatic speech recognition (ASR) model.
Are you a GPT-3 bot tuned on 4chan?
How faster-whisper speed [on GPU] compares to whisper-ConstMe?
at least 5x faster on CPU and 10x faster if you use GPU
at least 5x faster
at least 5x faster on CPU and 10x faster if you use GPU
So, few minutes ago you didn't knew what whisper-ConstMe is, and now you are posting "benchmarks" out of you ass...
how was my post irrelevent ? it's a port of whisper.cpp and the benchmark is testing whisper.cpp
If you are not a bot then clearly with some mental deficiency.
the benchmarks are from the repo and i am autistic.
I see... Take ten deep breaths and no need to type more posts. I'm unsubscribing from this thread.
I'm not sure why this post devolved into insults instead of mutual understanding.
whisper-ConstMe is a GPU-enabled implementation of Whisper. Does faster-whisper provide any benefits in terms of speed or accuracy or GPU ram usage compared to it? ConstMe is already integrated into SubtitleEdit which is why the question is relevant.
I'm not sure why this post devolved into insults instead of mutual understanding.
whisper-ConstMe is a GPU-enabled implementation of Whisper. Does faster-whisper provide any benefits in terms of speed or accuracy or GPU ram usage compared to it? ConstMe is already integrated into SubtitleEdit which is why the question is relevant.
Yes it does go about 5x faster with the optimizations they provided, that is what the benchmark I posted show, they are on the faster-whisper GitHub. You can also use Whisper-cTranslate2 directly
or GPU ram usage compared to it?
we also save a lot of VRAM which means we can run large-v2 on 4gb GPUs
Btw, I find large model's timestamps way less accurate than medium, when transcription is not better.
maybe for english medium is fine but for multilingual large-v2 is a lot more useful than having to download a specific model for each language
Const-Me also has huge speed boosts over CPU-only implementations. I'll assume ConstMe and Faster Whisper are comparable unless someone reports data to the contrary.
Const-Me also has huge speed boosts over CPU-only implementations. I'll assume ConstMe and Faster Whisper are comparable unless someone reports data to the contrary.
can you provide a benchmark that shows this ?
I am talking about 5x speed on GPU vs GPU, not cpu vs gpu
Const-me is a GPU implementation of CPP that is much faster.
David M's experience here: https://www.youtube.com/watch?v=RRF5AS6JVtI&list=PLG8jlFKr-RtdO_r3YAp9cncEEqJRkIltB&index=83
I used CPP on my GTX 1050/Intel i7-7700HQ laptop on a 2:35 minute file in 13 seconds (tiny.en model). ConstMe was about 5 seconds
With the Base model and the same 2:35 file Const-me. 8 seconds CPP: 22 seconds OpenAI (Python) no GPU: 52 seconds.
Larger models may show more difference but my GPU only has 4GB ram.
I don't see a need to test Faster Whisper unless it gets embedded with SubtitleEdit. Feel free to test it and report results.
On Wed, Apr 12, 2023 at 9:59 PM ras0k @.***> wrote:
Const-Me also has huge speed boosts over CPU-only implementations. I'll assume ConstMe and Faster Whisper are comparable unless someone reports data to the contrary.
Const-Me is whisper.cpp which is a CPU-only implementation, no ? whiper.cpp is in the benchmark
— Reply to this email directly, view it on GitHub https://github.com/SubtitleEdit/subtitleedit/issues/6816#issuecomment-1505233053, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZOMNH6M3F4B3C4V3V3XA2RKPANCNFSM6AAAAAAW2Y4LVI . You are receiving this because you commented.Message ID: @.***>
I understand and respect your desire to not ponder on this but for me a tiny benchmark is completely useless, i am only talking about comparing whisper (in gpu mode) and faster-whisper (also gpu mode) on large-v2 because I believe there will be a use-case for a lot of users. I will do my best to provide the benchmarks that you asked me soon.
Larger models may show more difference but my GPU only has 4GB ram.
you can already try large-v2 on faster-whisper with your GPU, is that not incentive enough to want it ?
You are the one who wants this implementation of Whisper to be included yet you haven't provided any test data to show how it is better than the current options.
I care about workflow, not absolute speed. If it's not in SubtitleEdit or integrated into my NLE I don't have any reason to use it as it will slow me down.
How much VRAM do you need for the large v2 model in Faster Whisper? That may limit its interest to users.
On Wed, Apr 12, 2023 at 11:36 PM ras0k @.***> wrote:
I understand and respect your desire to not ponder on this but for me a tiny benchmark is completely useless, i am only talking about comparing whisper (in gpu mode) and faster-whisper (also gpu mode) on large-v2 because I believe there will be a use-case for a lot of users. I will do my best to provide the benchmarks that you asked me soon.
— Reply to this email directly, view it on GitHub https://github.com/SubtitleEdit/subtitleedit/issues/6816#issuecomment-1505391051, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZO423QTCNZ23RHBJ3TXA24VNANCNFSM6AAAAAAW2Y4LVI . You are receiving this because you commented.Message ID: @.***>
How much VRAM do you need for the large v2 model in Faster Whisper? That may limit its interest to users.
3.09 GB
you haven't provided any test data to show how it is better than the current options.
faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models.
This implementation is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.
I posted the full benchmarks up there in my first reply but I will also try subtitleEdit and post my results soon.
Is that the needed VRAM or the model size? I can't run the medium model (1.5gb) on my 4GB GPU FWIW.
On Wed, Apr 12, 2023 at 11:41 PM ras0k @.***> wrote:
How much VRAM do you need for the large v2 model in Faster Whisper? That may limit its interest to users.
3.09 GB
https://huggingface.co/guillaumekln/faster-whisper-large-v2/tree/main
— Reply to this email directly, view it on GitHub https://github.com/SubtitleEdit/subtitleedit/issues/6816#issuecomment-1505400137, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZKP5CJG2Z4MK6YJN2DXA25KLANCNFSM6AAAAAAW2Y4LVI . You are receiving this because you commented.Message ID: @.***>
Is that the needed VRAM or the model size? I can't run the medium model (1.5gb) on my 4GB GPU FWIW. … On Wed, Apr 12, 2023 at 11:41 PM ras0k @.> wrote: How much VRAM do you need for the large v2 model in Faster Whisper? That may limit its interest to users. 3.09 GB https://huggingface.co/guillaumekln/faster-whisper-large-v2/tree/main — Reply to this email directly, view it on GitHub <#6816 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZKP5CJG2Z4MK6YJN2DXA25KLANCNFSM6AAAAAAW2Y4LVI . You are receiving this because you commented.Message ID: @.>
oh sorry yes model size, I am not sure about VRAM use I will try right now but if you take the time to read the benchmarks I posted they say 4.8gb or 3.1gb depending on fp16 or int8
Executed with CUDA 11.7.1 on a NVIDIA Tesla V100S.
Is that the needed VRAM or the model size? I can't run the medium model (1.5gb) on my 4GB GPU FWIW. … On Wed, Apr 12, 2023 at 11:41 PM ras0k @.> wrote: How much VRAM do you need for the large v2 model in Faster Whisper? That may limit its interest to users. 3.09 GB https://huggingface.co/guillaumekln/faster-whisper-large-v2/tree/main — Reply to this email directly, view it on GitHub <#6816 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZKP5CJG2Z4MK6YJN2DXA25KLANCNFSM6AAAAAAW2Y4LVI . You are receiving this because you commented.Message ID: @.>
just FYI right now I am testing large model on contMe and it's about 4.2GB so medium should run on your 4GB gpu, medium shows as about 2.3 GB usage max
my GPU is a 2060 6GB
for english medium.en is fine but for french not even large works so I really need large-v2 for it's multilingual capacities
do you want me to compare the speed of ConstMe vs Faster-Whisper on large just for benchmarking purposes ?
Please do compare them, that would be interesting.
I restarted SubtitleEdit and tried Const-me again with medium and this time it did work on my 4GB GPU. Last time it just "completed" but didn't produce a subtitle file. I haven't had time to troubleshoot and I mainly do transcriptions on my more powerful desktop machine anyway.
On Thu, Apr 13, 2023 at 12:11 AM ras0k @.***> wrote:
do you want me to compare the speed of ConstMe vs Faster-Whisper on large just for benchmarking purposes ?
— Reply to this email directly, view it on GitHub https://github.com/SubtitleEdit/subtitleedit/issues/6816#issuecomment-1505448184, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZMS77JIARX2FSW2TOTXA3AYVANCNFSM6AAAAAAW2Y4LVI . You are receiving this because you commented.Message ID: @.***>
I don't see a need to test Faster Whisper unless it gets embedded with SubtitleEdit.
@rsmith02ct there is no problem with embedding it to SubtitleEdit, it's same as OpenAI you just need to download models manually, like described there: Faster-Whisper.
I don't see a need to test Faster Whisper unless it gets embedded with SubtitleEdit.
@rsmith02ct there is no problem with embedding it to SubtitleEdit, it's same as OpenAI you just need to download models manually, like described there: Faster-Whisper.
did you actually try it ? from that method i gather we can use CPU but I'm not sure if it's using GPU
did you actually try it ? from that method i gather we can use CPU but I'm not sure if it's using GPU
I can't test anything with GPU. Faster-Whisper should work same as OpenAI, if CUDA is detected then it should use GPU, if not then CPU. Check your GPU usage, and check if OpenAI runs on GPU.
OK, I've added a test with Whisper CTranslate2: https://github.com/jordimas/whisper-ctranslate2
Included in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.12/SubtitleEditBeta.zip
Note: Models will be downloaded the first time a model is used!
OK, I've added a test with Whisper CTranslate2: https://github.com/jordimas/whisper-ctranslate2
Included in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.12/SubtitleEditBeta.zip
Note: Models will be downloaded the first time a model is used!
🐐
so I located the whisper-ctranslate2.exe thati had from my previous tests but I'm not sure it's taking the right model, i am looking for https://huggingface.co/guillaumekln/faster-whisper-large-v2 and all i see is large and it doesn't seem to be downloading anything
For a compiled version is Whisper-Faster-v2023.03.31-b77 what I should be using? [Edit- changed name of the exe to what SE is looking for but doesn't seem to work. I need a compiled binary from somewhere]
With the Base model and the same 2:35 file Const-me. 8 seconds CPP: 22 seconds FasterWhisper (CPU) :33 seconds [note this is an older build and not from https://github.com/jordimas/whisper-ctranslate2 as I can't compile it] OpenAI (Python) no GPU: 52 seconds.
OK, I've added a test with Whisper CTranslate2: https://github.com/jordimas/whisper-ctranslate2
Included in latest beta: https://github.com/SubtitleEdit/subtitleedit/releases/download/3.6.12/SubtitleEditBeta.zip
Note: Models will be downloaded the first time a model is used!
Could the download and transcribe steps be separate? Right now it says "transcribing audio to text" but the reality is it's attempting to download a new model. 15 minutes later it's still downloading but says transcribing which is confusing.
For a compiled version is Whisper-Faster-v2023.03.31-b77 what I should be using?
You can use it with both "OpenAI" or "CTranslate2" options.
Just for "CTranslate2" rename whisper.exe
to whisper-ctranslate2.exe
. And for "CTranslate2" you don't need to have real or fake OpenAI's models.
Notes:
We talk about standalone builds: whisper-standalone-win
By default Whisper-Faster-v2023.03.31-b77
looks for models in the same folder, and they should be in folders named like this -> _faster-whisper_medium
, it won't autodownload models, get them from https://huggingface.co/guillaumekln .
Thanks so much! This is the standalone build I downloaded: https://github.com/Purfview/whisper-standalone-win/releases/tag/v2023.03.31-b77-faster
SubtitleEdit thinks the models are here: C:\Users\Roger.cache\whisper but actually they should be where you say. in _faster-whisper_tiny.en for example and not named like the .py ones but just model.bin and the other files. Otherwise it gives an error.
and after all that...
2:35 file; base model Const-me. 8 seconds Whisper-faster: 16 seconds CPP: 22 seconds OpenAI (Python): 52 seconds.
Will test on my desktop where CUDA support should work unlike on this laptop.
Const-me. 8 seconds Whisper-faster: 16 seconds
So on CPU it's even faster than Const-me, if you didn't subtracted ~12s startup delay.
I just had a chance to set it up on my desktop as well. It's an Intel i5-13600K with RTX 2080 Super GPU and CUDA works with SubtitleEdit.
Same file (2:35) with base models:
const me: 2.5s Ctranslate2: 4s CPP: 8s OpenAI: 11s
For this sample Ctranslate2 had higher quality output recognizing difficult proper names even with the base model.
I also tried the large model. Ctranslate2: 17s Const me: 19s CPP: 1m47s
CPP seems to use more cores than it used to- all 14 were in use at times.
On Thu, Apr 13, 2023 at 11:34 AM Purfview @.***> wrote:
Const-me. 8 seconds Whisper-faster: 16 seconds
So on CPU it's even faster than Const-me, if you didn't subtracted ~12s startup delay.
— Reply to this email directly, view it on GitHub https://github.com/SubtitleEdit/subtitleedit/issues/6816#issuecomment-1506244013, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZMX6RRO4DIOU5RDEMLXA5Q3HANCNFSM6AAAAAAW2Y4LVI . You are receiving this because you were mentioned.Message ID: @.***>
I tried it on a Japanese file but got an error with Faster Whisper.
From the log:
Date: 04/13/2023 12:56:27
SE: 3.6.12.60 - Microsoft Windows NT 10.0.22621.0 - 64-bit
Message: Calling whisper (CTranslate2) with :
C:\Users\rsmit\Dropbox\transfer
settings\Whisper-Faster\Whisper-Faster\whisper-ctranslate2.exe --language
ja --model "large" "D:\Temp\fe4f2032-7730-4ce3-95b2-e5e59434827b.wav"
UnicodeEncodeError: 'charmap' codec can't encode characters in position
26-45: character maps to
File "encodings\cp1252.py", line 19, in encode
File "D:\whisper-fast__main__.py", line 399, in cli
File "D:\whisper-fast__main__.py", line 406, in
Traceback (most recent call last):
[21096] Failed to execute script 'main' due to unhandled exception!
Calling whisper CTranslate2 done in 00:00:09.0137460 Loading result from STDOUT
OpenAiWhisper also doen't work right with Japanese (Const-me and CPP work without error though Const-me's timings are quite messed up leaving CPP as the best option for Japanese.)
Date: 04/13/2023 13:01:15
SE: 3.6.12.60 - Microsoft Windows NT 10.0.22621.0 - 64-bit
Message: Calling whisper (OpenAI) with : C:\Users\rsmit\Dropbox\transfer
settings\Whisper-OpenAI\whisper.exe --language ja --model "medium"
"D:\Temp\42463c7d-fbb9-415a-a1c7-f1f7b3aed181.wav"
UnicodeEncodeError: 'charmap' codec can't encode characters in position
26-45: character maps to
File "encodings\cp1252.py", line 19, in encode
File "whisper\transcribe.py", line 170, in add_segment
File "whisper\transcribe.py", line 209, in transcribe
File "whisper\transcribe.py", line 314, in cli
File "D:\whisper__main__.py", line 4, in
Traceback (most recent call last):
[19592] Failed to execute script 'main' due to unhandled exception! Calling whisper OpenAI done in 00:00:09.0315877 Loading result from STDOUT
On Thu, Apr 13, 2023 at 12:42 PM Roger Smith @.***> wrote:
I just had a chance to set it up on my desktop as well. It's an Intel i5-13600K with RTX 2080 Super GPU and CUDA works with SubtitleEdit.
Same file (2:35) with base models:
const me: 2.5s Ctranslate2: 4s CPP: 8s OpenAI: 11s
For this sample Ctranslate2 had higher quality output recognizing difficult proper names even with the base model.
I also tried the large model. Ctranslate2: 17s Const me: 19s CPP: 1m47s
CPP seems to use more cores than it used to- all 14 were in use at times.
On Thu, Apr 13, 2023 at 11:34 AM Purfview @.***> wrote:
Const-me. 8 seconds Whisper-faster: 16 seconds
So on CPU it's even faster than Const-me, if you didn't subtracted ~12s startup delay.
— Reply to this email directly, view it on GitHub https://github.com/SubtitleEdit/subtitleedit/issues/6816#issuecomment-1506244013, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5EWOZMX6RRO4DIOU5RDEMLXA5Q3HANCNFSM6AAAAAAW2Y4LVI . You are receiving this because you were mentioned.Message ID: @.***>
I tried it on a Japanese file but got an error with Faster Whisper. UnicodeEncodeError: 'charmap' codec can't encode characters in position
Thanks for report. I've used custom "quick-hack" command-line interface for it, as faster-whisper doesn't come with CLI. Maybe today I'll compile it with proper CLI. Could you share a short sample of that Japanese audio for tests?
Hello,
For this sample Ctranslate2 had higher quality output recognizing difficult proper names even with the base model.
I suppose each implementation is running with its default parameters. Note that they have a different performance/quality trade-off by default. For example whisper-ctranslate2 (faster-whisper) is using a beam size of 5 by default (higher quality, slower) while whisper.cpp and Const-me/Whisper are using a beam size of 1 by default (lower quality, faster).
This should be considered when comparing the transcription time. Ideally they should all use the same parameters.
Hi Purfview, for Japanese I think you can just use any audio and just tell it that it's Japanese. If you want to use the file I have been testing on (to also assess quality) I can provide it as it's an old public video. link Some quality issues are non-existent prefecture names (the worst are nonsensical, second-worst is 三重県 which is not what the guy is saying and correct is 宮城県) kaki should be 牡蠣 oyster not the fruit 柿
Very interesting on performance/quality. Is that something SubtitleEdit could let us set with a slider?
Hello,
For this sample Ctranslate2 had higher quality output recognizing difficult proper names even with the base model.
I suppose each implementation is running with its default parameters. Note that they have a different performance/quality trade-off by default. For example whisper-ctranslate2 (faster-whisper) is using a beam size of 5 by default (higher quality, slower) while whisper.cpp and Const-me/Whisper are using a beam size of 1 by default (lower quality, faster).
This should be considered when comparing the transcription time. Ideally they should all use the same parameters.
the man, the myth, the legend.
@rsmith02ct @niksedk
Same error with OpenAI, it's SubtitleEdit's issue. I think, instead of "Loading result from STDOUT" it should load srt file, same as CPP.
I did not read the whole thread about whisper GPU but can we avoid a lot of problems with VRAM and speed by switching to faster-whisper maybe ?