Open IllyaPysarchuk opened 1 year ago
I tried to do this, but I think this can only be done when OpenAI uploads the model to huggingface, maybe. ( The large-v3 which I couldn't find in huggingface now )
The weights are open-source so it should be possible to upload them?
I think this is not only conversion problem. The new large-v3 model uses 128 Mel frequency bins instead of 80 which is hardcoded in faster whisper now.
Change the feature_size
of the FeatureExtractor
from 80 to 128.
Could you submit that as a PR?
i kind of got it working by converting the pt with the openai to hf converter script and then running the ct2 converter on that + the tokenizer.json copied from large-v2
[23.60s -> 28.60s] You changed it to Ron when you bought your first Ron's coffee shop six years ago.
[28.60s -> 32.60s] Now you got 17 of them with eight more coming next quarter.
[32.60s -> 36.75s] May I help you with something?
[36.75s -> 40.75s] I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s] I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s] It's good.
[47.45s -> 47.97s] ,
[47.97s -> 48.47s] 제를,
[58.93s -> 73.47s] and I only, the Martinez, bad, I see, that's, is, and you know, it, it's, that, that, the negative, and I know, . . This, and I, inter, I look, William, the model, and I mean, I know throwing the society kettle, and, I, like, descon, the Mess, and, the, or the Rad, the head, I mean, he, and, but, it. So, you'reтом, I, the VI, the sub, I took, and, called the holes, and Iemente, so I, the cow такую, YouTube don't, and, the, the gear, collaborative,業vez. I welcome, the fortunate, and, و, the threekan, the handy, remote,ばい, degree,nem, Frank, de,б, "Con, �,rent, besoin, نہ, MR's,
[76.77s -> 81.03s] protocol it's not as anonymous as you think it is whoever's in control of the
[81.03s -> 86.67s] exit nodes is also in control of this traffic which makes me the one in
then tried copying over the config files from large-v2 (everything except for the model files) & adjusting as necessary ("num_mel_bins": 128, "vocab_size": 51866). didnt change any of the token ids
[14.05s -> 49.17s] Uh, you're Ron, but your real name's Rohit Mehta. You changed it to Ron when you bought your first Ron's Coffee shop six years ago. Now you got 17 of them with eight more coming next quarter. May I help you with something? I like coming here because your Wi-Fi was fast. I mean, you're one of the few spots that has a fiber connection with gigabit speed. It's good. It's so good it scratched that part of my mind.
[49.17s -> 79.15s] reconnect the data, but it's real. It's all in one thing. Part that doesn't allow good to exist without condition. So I started intercepting all the traffic on your network. That's when I noticed something strange. That's when I decided to hack you. Hack? I know you run a website called Plato's Boys. Pardon me? You're using tor networking to keep the servers anonymous. You made it really hard for anyone to see it. But I saw it. The onion rooting protocol. It's not as anonymous as you think it is.
.....(eventually broke down)
[205.46s -> 235.44s] ,
[215.74s -> 215.88s] G.
[216.30s -> 216.74s] G,
[216.80s -> 221.62s] ,
[221.62s -> 221.66s] ,
[221.66s -> 221.98s] ,
also did the second method with large-v2.pt and it works perfectly. just gotta wait for the official hf release but if you really want to get it working now, play around with tokenizer.json and the token ids in config.json
i kind of got it working by converting the pt with the openai to hf converter script and then running the ct2 converter on that + the tokenizer.json copied from large-v2
then tried copying over the config files from large-v2 (everything except for the model files) & adjusting as necessary ("num_mel_bins": 128, "vocab_size": 51866). didnt change any of the token ids
also did the second method with large-v2.pt and it works perfectly. just gotta wait for the official hf release but if you really want to get it working now, play around with tokenizer.json and the token ids in config.json
Thanks!
Was there any confirmation that OpenAI will upload the model to huggingface?
i kind of got it working by converting the pt with the openai to hf converter script and then running the ct2 converter on that + the tokenizer.json copied from large-v2
[23.60s -> 28.60s] You changed it to Ron when you bought your first Ron's coffee shop six years ago. [28.60s -> 32.60s] Now you got 17 of them with eight more coming next quarter. [32.60s -> 36.75s] May I help you with something? [36.75s -> 40.75s] I like coming here because your Wi-Fi was fast. [40.75s -> 44.75s] I mean, you're one of the few spots that has a fiber connection with gigabit speed. [44.75s -> 46.75s] It's good. [47.45s -> 47.97s] , [47.97s -> 48.47s] 제를, [58.93s -> 73.47s] and I only, the Martinez, bad, I see, that's, is, and you know, it, it's, that, that, the negative, and I know, . . This, and I, inter, I look, William, the model, and I mean, I know throwing the society kettle, and, I, like, descon, the Mess, and, the, or the Rad, the head, I mean, he, and, but, it. So, you'reтом, I, the VI, the sub, I took, and, called the holes, and Iemente, so I, the cow такую, YouTube don't, and, the, the gear, collaborative,業vez. I welcome, the fortunate, and, و, the threekan, the handy, remote,ばい, degree,nem, Frank, de,б, "Con, �,rent, besoin, نہ, MR's, [76.77s -> 81.03s] protocol it's not as anonymous as you think it is whoever's in control of the [81.03s -> 86.67s] exit nodes is also in control of this traffic which makes me the one in
then tried copying over the config files from large-v2 (everything except for the model files) & adjusting as necessary ("num_mel_bins": 128, "vocab_size": 51866). didnt change any of the token ids
[14.05s -> 49.17s] Uh, you're Ron, but your real name's Rohit Mehta. You changed it to Ron when you bought your first Ron's Coffee shop six years ago. Now you got 17 of them with eight more coming next quarter. May I help you with something? I like coming here because your Wi-Fi was fast. I mean, you're one of the few spots that has a fiber connection with gigabit speed. It's good. It's so good it scratched that part of my mind. [49.17s -> 79.15s] reconnect the data, but it's real. It's all in one thing. Part that doesn't allow good to exist without condition. So I started intercepting all the traffic on your network. That's when I noticed something strange. That's when I decided to hack you. Hack? I know you run a website called Plato's Boys. Pardon me? You're using tor networking to keep the servers anonymous. You made it really hard for anyone to see it. But I saw it. The onion rooting protocol. It's not as anonymous as you think it is. .....(eventually broke down) [205.46s -> 235.44s] , [215.74s -> 215.88s] G. [216.30s -> 216.74s] G, [216.80s -> 221.62s] , [221.62s -> 221.66s] , [221.66s -> 221.98s] ,
also did the second method with large-v2.pt and it works perfectly. just gotta wait for the official hf release but if you really want to get it working now, play around with tokenizer.json and the token ids in config.json
can you share the converted v3 model , put it in some net drive , like google drive, and related modified files, so anyone want to use it can just copy it , thanks
Was there any confirmation that OpenAI will upload the model to huggingface?
According to this comment, it is converting now (https://github.com/openai/whisper/discussions/1762#discussioncomment-7496805)
Was there any confirmation that OpenAI will upload the model to huggingface?
According to this comment, it is converting now (openai/whisper#1762 (reply in thread))
Alright lets go!
Hello. I wrote to Guillaume to see if he is willing to accept help to maintain the project. I have an old Guillaume's email address. If somebody has a recent email that works please send it to me jmas@softcatala.org
@jordimas Guillaume said ping to nguyendc-systran, there I did, let's see if he shows up.
this PR should work: https://github.com/guillaumekln/faster-whisper/pull/548
it doesn't, i just tested it and the provided ct2 conversion is the same as my method 1 above
[36.75s -> 40.75s] I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s] I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s] It's good.
[46.75s -> 75.81s] broadcast,, our good,something, plays,95, the, law, cancel, the, the team, the Bet, or, the, don't, the perfect, the peer, return, but, thenego, the ley, the gut, but, the, the, ,, 3, the time, the, ., but, ,, the, ,, ., ., ,, ,D, ,, , The ,, , ,, , ,, .,
also alignment doesn't work
Traceback (most recent call last):
File "c:\git\faster-whisper-3\test.py", line 22, in <module>
for segment in segments:
File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 947, in restore_speech_timestamps
for segment in segments:
File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 563, in generate_segments
self.add_word_timestamps(
File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 790, in add_word_timestamps
alignment = self.find_alignment(
File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 900, in find_alignment
result = self.model.align(
RuntimeError: CUDA failed with error out of memor
just gotta wait for the hf release to do a proper conversion
hmm, you're right, it returned correct results on very short segments I tested but is nonsense on longer segments. weird, I wonder why this is.
think its the tokenizer copied from large-v2, depending where they put in the new Cantonese token a lot of the token ids could be offset
fwiw turning temperature down to 0 has given me reproducible output across all the conversions i have tried so far, previously it was random, frequently non-english text that made me suspect the language switching but its probably (hopefully) just a side effect of the tokens being off
[1.66s -> 23.60s] You're Ron, but your real name is Rohit Mehta.
[23.60s -> 28.60s] You changed it to Ron when you bought your first Ron's coffee shop six years ago.
[28.60s -> 32.60s] Now you got 17 of them with eight more coming next quarter.
[32.60s -> 36.75s] May I help you with something?
[36.75s -> 40.75s] I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s] I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s] It's good.
[47.51s -> 48.41s] ,
[49.23s -> 58.65s] I'm I'm I'm I'm I know, I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm a
[90.66s -> 103.54s] , I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I's I'm I's I'm I's I's I's I'm I's I's I's I's I's I's I's I's I's I's like, I's I's like, I's I's I's I's I's I's I's the
[295.81s -> 305.81s] , I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I,
[305.81s -> 315.83s] ,
The model is available now at https://huggingface.co/openai/whisper-large-v3 thanks to @sanchit-gandhi !
I tried converting it to CT translate with the following command: ct2-transformers-converter --model openai/whisper-large-v3 --output_dir faster-whisper-large-v3 \ --copy_files tokenizer.json --quantization float16
(same as used for v2)
However, I get this error:
OSError: openai/whisper-large-v3 does not appear to have a file named
pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
The huggingface repo does indeed only have model.safetensors
for the model weights.
Anybody have a solution for this? Can we convert the .safetensors to .bin?
@thomasmol there is no tokenizer.json, only the tokenizer_config.json. renaming that didn't work but i wrote a quick script to save the tokenizer and copy the files over
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("openai/whisper-large-v3")
tokenizer.save_pretrained("./whisper-large-v3")
and it seems to be working, uploading to hf now
bababababooey/faster-whisper-large-v3
Hey, sorry to jump in at the last minute. What do i have to do to use this now? bababababooey/faster-whisper-large-v3
@User1231300 my fork https://github.com/bungerr/faster-whisper-3 should work in the meantime while we work on getting #548 merged
model = WhisperModel(
'large-v3', device="cuda", compute_type="float16"
)
thanks a lot for the effort, been waiting for this and will try it later. had some quick tests with the official large-v3 yesterday but the performance was not very satisfactory, with more errors and duplicates when transcribing Japanese.
@thomasmol checkout this repo. It has pytorch_model.bin file
thanks all contribution for whisper-v3!
I found some mismatch v2 and v3 at whisper.c in Ctranslate2. So I fixed it. compile this pacakage if you need to use multiligual mode at faster-whisper 3
thanks all contribution for whisper-v3!
I found some mismatch v2 and v3 at whisper.c in Ctranslate2. So I fixed it. compile this pacakage if you need to use multiligual mode at faster-whisper 3
can you please give more info how I can do this?
@circuluspibo want to make a PR to upstream? it feels like that will resolve a lot of issues. (oops, missed that it was in the other PR too!)
@circuluspibo want to make a PR to upstream? it feels like that will resolve a lot of issues.
There was already https://github.com/OpenNMT/CTranslate2/pull/1530 fixing that issue (among others).
The model is available now at https://huggingface.co/openai/whisper-large-v3 thanks to @sanchit-gandhi ! I tried converting it to CT translate with the following command:
ct2-transformers-converter --model openai/whisper-large-v3 --output_dir faster-whisper-large-v3 \ --copy_files tokenizer.json --quantization float16
(same as used for v2) However, I get this error:OSError: openai/whisper-large-v3 does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
The huggingface repo does indeed only have
model.safetensors
for the model weights. Anybody have a solution for this? Can we convert the .safetensors to .bin?
Pursuant to the conversation I STARTED HERE, they graciously uploaded the Float32 version, and I believe that the .bin files are up there now. However, they need to be combined before trying to convert, is that correct? Here's an example regarding a different model:
COPY /B dolphin-2.2-70b.Q8_0.gguf-split-a + dolphin-2.2-70b.Q8_0.gguf-split-b dolphin-2.2-70b.Q8_0.gguf
del dolphin-2.2-70b.Q8_0.gguf-split-a dolphin-2.2-70b.Q8_0.gguf-split-b
cat dolphin-2.2-70b.Q8_0.gguf-split-* > dolphin-2.2-70b.Q8_0.gguf && rm dolphin-2.2-70b.Q8_0.gguf-split-*
Assuming that we have the .bin...as far as converting (either the float32/float16)...the Ctranslate2 repository is working on it right now and I think they're close to a solution if not complete. See HERE.
I'm no expert, but maybe wait to see how the converter is ultimately modified in Ctranslate2 since faster-whisper relies on it???
Interested in helping any way I can. Thanks!
Standalone Faster-Whisper r160.3
now supports large-v3
, only for Windows atm.
Standalone Faster-Whisper
r160.3
now supportslarge-v3
, only for Windows atm.
Nice, I'll take a look. Does it use the Float32 or Float16, both?
Nice, I'll take a look. Does it use the Float32 or Float16, both?
Models? int8_float32 model by default, if you want float16 model then type --model=large-v3-fp16
.
Cool, I'll check the --help but thanks for the tip. How did you implement the large-v3 so quickly or is it a trade secret? I know the people at the Ctranslate2 github have been working on it, maybe they solved it and you implemented it? I'd like to use the large-v3 in a python script, not CLI, but if you did it a proprietary way I can respect that...
Standalone Faster-Whisper
r160.3
now supportslarge-v3
, only for Windows atm.
Cannot find any executables files here.
@User1231300 my fork https://github.com/bungerr/faster-whisper-3 should work in the meantime while we work on getting #548 merged
model = WhisperModel( 'large-v3', device="cuda", compute_type="float16" )
Hey thank you very much for this. I noticed it's english only. How can i make it work for other languages too? I need Italian
Standalone Faster-Whisper
r160.3
now supportslarge-v3
, only for Windows atm.Cannot find any executables files here.
Executables are in Releases, it's at the right side of the page.
Guillaume started a job as Machine Learning Engineer at Apple last month (which he absolutely deserved to get), so I honestly don't think he'll have the time to continue his work on faster-whisper :(