Arche151 commented 1 year ago

Guillaume started a job as Machine Learning Engineer at Apple last month (which he absolutely deserved to get), so I honestly don't think he'll have the time to continue his work on faster-whisper :(

jhj0517 commented 1 year ago

I tried to do this, but I think this can only be done when OpenAI uploads the model to huggingface, maybe. ( The large-v3 which I couldn't find in huggingface now )

zamoshchin commented 1 year ago

The weights are open-source so it should be possible to upload them?

https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

https://github.com/openai/whisper/pull/1761/files

alexey-mik commented 1 year ago

I think this is not only conversion problem. The new large-v3 model uses 128 Mel frequency bins instead of 80 which is hardcoded in faster whisper now.

ben91lin commented 1 year ago

Change the feature_size of the FeatureExtractor from 80 to 128.

vadi2 commented 1 year ago

Could you submit that as a PR?

bungerr commented 1 year ago

i kind of got it working by converting the pt with the openai to hf converter script and then running the ct2 converter on that + the tokenizer.json copied from large-v2

[23.60s -> 28.60s]  You changed it to Ron when you bought your first Ron's coffee shop six years ago.
[28.60s -> 32.60s]  Now you got 17 of them with eight more coming next quarter.
[32.60s -> 36.75s]  May I help you with something?
[36.75s -> 40.75s]  I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s]  I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s]  It's good.
[47.45s -> 47.97s] ,
[47.97s -> 48.47s] 제를,
[58.93s -> 73.47s]  and I only, the Martinez, bad, I see, that's, is, and you know, it, it's, that, that, the negative, and I know, . . This, and I, inter, I look, William, the model, and I mean, I know throwing the society kettle, and, I, like, descon, the Mess, and, the, or the Rad, the head, I mean, he, and, but, it. So, you'reтом, I, the VI, the sub, I took, and, called the holes, and Iemente, so I, the cow такую, YouTube don't, and, the, the gear, collaborative,業vez. I welcome, the fortunate, and, و, the threekan, the handy, remote,ばい, degree,nem, Frank, de,б, "Con, �,rent, besoin, نہ, MR's,
[76.77s -> 81.03s]  protocol it's not as anonymous as you think it is whoever's in control of the
[81.03s -> 86.67s]  exit nodes is also in control of this traffic which makes me the one in

then tried copying over the config files from large-v2 (everything except for the model files) & adjusting as necessary ("num_mel_bins": 128, "vocab_size": 51866). didnt change any of the token ids

[14.05s -> 49.17s]  Uh, you're Ron, but your real name's Rohit Mehta. You changed it to Ron when you bought your first Ron's Coffee shop six years ago. Now you got 17 of them with eight more coming next quarter. May I help you with something? I like coming here because your Wi-Fi was fast. I mean, you're one of the few spots that has a fiber connection with gigabit speed. It's good. It's so good it scratched that part of my mind.
[49.17s -> 79.15s]  reconnect the data, but it's real. It's all in one thing. Part that doesn't allow good to exist without condition. So I started intercepting all the traffic on your network. That's when I noticed something strange. That's when I decided to hack you. Hack? I know you run a website called Plato's Boys. Pardon me? You're using tor networking to keep the servers anonymous. You made it really hard for anyone to see it. But I saw it. The onion rooting protocol. It's not as anonymous as you think it is.

.....(eventually broke down)

[205.46s -> 235.44s] ,
[215.74s -> 215.88s] G.
[216.30s -> 216.74s] G,
[216.80s -> 221.62s] ,
[221.62s -> 221.66s] ,
[221.66s -> 221.98s] ,

also did the second method with large-v2.pt and it works perfectly. just gotta wait for the official hf release but if you really want to get it working now, play around with tokenizer.json and the token ids in config.json

hoonlight commented 1 year ago

i kind of got it working by converting the pt with the openai to hf converter script and then running the ct2 converter on that + the tokenizer.json copied from large-v2

then tried copying over the config files from large-v2 (everything except for the model files) & adjusting as necessary ("num_mel_bins": 128, "vocab_size": 51866). didnt change any of the token ids

also did the second method with large-v2.pt and it works perfectly. just gotta wait for the official hf release but if you really want to get it working now, play around with tokenizer.json and the token ids in config.json

Thanks!

thomasmol commented 1 year ago

Was there any confirmation that OpenAI will upload the model to huggingface?

aizimuji commented 1 year ago

i kind of got it working by converting the pt with the openai to hf converter script and then running the ct2 converter on that + the tokenizer.json copied from large-v2

[23.60s -> 28.60s]  You changed it to Ron when you bought your first Ron's coffee shop six years ago.
[28.60s -> 32.60s]  Now you got 17 of them with eight more coming next quarter.
[32.60s -> 36.75s]  May I help you with something?
[36.75s -> 40.75s]  I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s]  I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s]  It's good.
[47.45s -> 47.97s] ,
[47.97s -> 48.47s] 제를,
[58.93s -> 73.47s]  and I only, the Martinez, bad, I see, that's, is, and you know, it, it's, that, that, the negative, and I know, . . This, and I, inter, I look, William, the model, and I mean, I know throwing the society kettle, and, I, like, descon, the Mess, and, the, or the Rad, the head, I mean, he, and, but, it. So, you'reтом, I, the VI, the sub, I took, and, called the holes, and Iemente, so I, the cow такую, YouTube don't, and, the, the gear, collaborative,業vez. I welcome, the fortunate, and, و, the threekan, the handy, remote,ばい, degree,nem, Frank, de,б, "Con, �,rent, besoin, نہ, MR's,
[76.77s -> 81.03s]  protocol it's not as anonymous as you think it is whoever's in control of the
[81.03s -> 86.67s]  exit nodes is also in control of this traffic which makes me the one in

then tried copying over the config files from large-v2 (everything except for the model files) & adjusting as necessary ("num_mel_bins": 128, "vocab_size": 51866). didnt change any of the token ids

[14.05s -> 49.17s]  Uh, you're Ron, but your real name's Rohit Mehta. You changed it to Ron when you bought your first Ron's Coffee shop six years ago. Now you got 17 of them with eight more coming next quarter. May I help you with something? I like coming here because your Wi-Fi was fast. I mean, you're one of the few spots that has a fiber connection with gigabit speed. It's good. It's so good it scratched that part of my mind.
[49.17s -> 79.15s]  reconnect the data, but it's real. It's all in one thing. Part that doesn't allow good to exist without condition. So I started intercepting all the traffic on your network. That's when I noticed something strange. That's when I decided to hack you. Hack? I know you run a website called Plato's Boys. Pardon me? You're using tor networking to keep the servers anonymous. You made it really hard for anyone to see it. But I saw it. The onion rooting protocol. It's not as anonymous as you think it is.

.....(eventually broke down)

[205.46s -> 235.44s] ,
[215.74s -> 215.88s] G.
[216.30s -> 216.74s] G,
[216.80s -> 221.62s] ,
[221.62s -> 221.66s] ,
[221.66s -> 221.98s] ,

also did the second method with large-v2.pt and it works perfectly. just gotta wait for the official hf release but if you really want to get it working now, play around with tokenizer.json and the token ids in config.json

can you share the converted v3 model , put it in some net drive , like google drive, and related modified files, so anyone want to use it can just copy it , thanks

sogris commented 1 year ago

Was there any confirmation that OpenAI will upload the model to huggingface?

According to this comment, it is converting now (https://github.com/openai/whisper/discussions/1762#discussioncomment-7496805)

thomasmol commented 1 year ago

Was there any confirmation that OpenAI will upload the model to huggingface?

According to this comment, it is converting now (openai/whisper#1762 (reply in thread))

Alright lets go!

jordimas commented 1 year ago

Hello. I wrote to Guillaume to see if he is willing to accept help to maintain the project. I have an old Guillaume's email address. If somebody has a recent email that works please send it to me jmas@softcatala.org

Purfview commented 1 year ago

@jordimas Guillaume said ping to nguyendc-systran, there I did, let's see if he shows up.

stillmatic commented 1 year ago

this PR should work: https://github.com/guillaumekln/faster-whisper/pull/548

bungerr commented 1 year ago

it doesn't, i just tested it and the provided ct2 conversion is the same as my method 1 above

[36.75s -> 40.75s]  I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s]  I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s]  It's good.
[46.75s -> 75.81s]  broadcast,, our good,something, plays,95, the, law, cancel, the, the team, the Bet, or, the, don't, the perfect, the peer, return, but, thenego, the ley, the gut, but, the, the, ,, 3, the time, the, ., but, ,, the, ,, ., ., ,, ,D, ,, , The ,, , ,, , ,, .,

also alignment doesn't work

Traceback (most recent call last):
  File "c:\git\faster-whisper-3\test.py", line 22, in <module>
    for segment in segments:
  File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 947, in restore_speech_timestamps
    for segment in segments:
  File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 563, in generate_segments
    self.add_word_timestamps(
  File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 790, in add_word_timestamps
    alignment = self.find_alignment(
  File "c:\git\faster-whisper-3\faster_whisper\transcribe.py", line 900, in find_alignment
    result = self.model.align(
RuntimeError: CUDA failed with error out of memor

just gotta wait for the hf release to do a proper conversion

stillmatic commented 1 year ago

hmm, you're right, it returned correct results on very short segments I tested but is nonsense on longer segments. weird, I wonder why this is.

bungerr commented 1 year ago

think its the tokenizer copied from large-v2, depending where they put in the new Cantonese token a lot of the token ids could be offset

fwiw turning temperature down to 0 has given me reproducible output across all the conversions i have tried so far, previously it was random, frequently non-english text that made me suspect the language switching but its probably (hopefully) just a side effect of the tokens being off

[1.66s -> 23.60s]  You're Ron, but your real name is Rohit Mehta.
[23.60s -> 28.60s]  You changed it to Ron when you bought your first Ron's coffee shop six years ago.
[28.60s -> 32.60s]  Now you got 17 of them with eight more coming next quarter.
[32.60s -> 36.75s]  May I help you with something?
[36.75s -> 40.75s]  I like coming here because your Wi-Fi was fast.
[40.75s -> 44.75s]  I mean, you're one of the few spots that has a fiber connection with gigabit speed.
[44.75s -> 46.75s]  It's good.
[47.51s -> 48.41s] ,
[49.23s -> 58.65s]  I'm I'm I'm I'm I know, I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm a
[90.66s -> 103.54s] , I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I'm I's I'm I's I'm I's I's I's I'm I's I's I's I's I's I's I's I's I's I's like, I's I's like, I's I's I's I's I's I's I's the
[295.81s -> 305.81s] , I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I, I,
[305.81s -> 315.83s] ,

thomasmol commented 1 year ago

The model is available now at https://huggingface.co/openai/whisper-large-v3 thanks to @sanchit-gandhi ! I tried converting it to CT translate with the following command: ct2-transformers-converter --model openai/whisper-large-v3 --output_dir faster-whisper-large-v3 \ --copy_files tokenizer.json --quantization float16 (same as used for v2) However, I get this error:

OSError: openai/whisper-large-v3 does not appear to have a file named 
pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

The huggingface repo does indeed only have model.safetensors for the model weights. Anybody have a solution for this? Can we convert the .safetensors to .bin?

bungerr commented 1 year ago

@thomasmol there is no tokenizer.json, only the tokenizer_config.json. renaming that didn't work but i wrote a quick script to save the tokenizer and copy the files over

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("openai/whisper-large-v3")
tokenizer.save_pretrained("./whisper-large-v3")

and it seems to be working, uploading to hf now

bungerr commented 1 year ago

bababababooey/faster-whisper-large-v3

User1231300 commented 1 year ago

Hey, sorry to jump in at the last minute. What do i have to do to use this now? bababababooey/faster-whisper-large-v3

bungerr commented 1 year ago

@User1231300 my fork https://github.com/bungerr/faster-whisper-3 should work in the meantime while we work on getting #548 merged

model = WhisperModel(
    'large-v3', device="cuda", compute_type="float16"
)

Ayanaminn commented 1 year ago

thanks a lot for the effort, been waiting for this and will try it later. had some quick tests with the official large-v3 yesterday but the performance was not very satisfactory, with more errors and duplicates when transcribing Japanese.

kurianbenoy-sentient commented 1 year ago

@thomasmol checkout this repo. It has pytorch_model.bin file

https://huggingface.co/versae/whisper-large-v3/tree/main

circuluspibo commented 1 year ago

thanks all contribution for whisper-v3!

I found some mismatch v2 and v3 at whisper.c in Ctranslate2. So I fixed it. compile this pacakage if you need to use multiligual mode at faster-whisper 3

https://github.com/circuluspibo/CTranslate2

tema2002 commented 1 year ago

thanks all contribution for whisper-v3!

I found some mismatch v2 and v3 at whisper.c in Ctranslate2. So I fixed it. compile this pacakage if you need to use multiligual mode at faster-whisper 3

https://github.com/circuluspibo/CTranslate2

can you please give more info how I can do this?

stillmatic commented 1 year ago

@circuluspibo want to make a PR to upstream? it feels like that will resolve a lot of issues. (oops, missed that it was in the other PR too!)

fireattack commented 1 year ago

@circuluspibo want to make a PR to upstream? it feels like that will resolve a lot of issues.

There was already https://github.com/OpenNMT/CTranslate2/pull/1530 fixing that issue (among others).

BBC-Esq commented 1 year ago

The model is available now at https://huggingface.co/openai/whisper-large-v3 thanks to @sanchit-gandhi ! I tried converting it to CT translate with the following command: ct2-transformers-converter --model openai/whisper-large-v3 --output_dir faster-whisper-large-v3 \ --copy_files tokenizer.json --quantization float16 (same as used for v2) However, I get this error:
OSError: openai/whisper-large-v3 does not appear to have a file named 
pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
The huggingface repo does indeed only have model.safetensors for the model weights. Anybody have a solution for this? Can we convert the .safetensors to .bin?

Pursuant to the conversation I STARTED HERE, they graciously uploaded the Float32 version, and I believe that the .bin files are up there now. However, they need to be combined before trying to convert, is that correct? Here's an example regarding a different model:

Windows

COPY /B dolphin-2.2-70b.Q8_0.gguf-split-a + dolphin-2.2-70b.Q8_0.gguf-split-b dolphin-2.2-70b.Q8_0.gguf
del dolphin-2.2-70b.Q8_0.gguf-split-a dolphin-2.2-70b.Q8_0.gguf-split-b

Linux/Mac

cat dolphin-2.2-70b.Q8_0.gguf-split-* > dolphin-2.2-70b.Q8_0.gguf && rm dolphin-2.2-70b.Q8_0.gguf-split-*

Assuming that we have the .bin...as far as converting (either the float32/float16)...the Ctranslate2 repository is working on it right now and I think they're close to a solution if not complete. See HERE.

I'm no expert, but maybe wait to see how the converter is ultimately modified in Ctranslate2 since faster-whisper relies on it???

Interested in helping any way I can. Thanks!

Purfview commented 1 year ago

Standalone Faster-Whisper r160.3 now supports large-v3, only for Windows atm.

BBC-Esq commented 1 year ago

Standalone Faster-Whisper r160.3 now supports large-v3, only for Windows atm.

Nice, I'll take a look. Does it use the Float32 or Float16, both?

Purfview commented 1 year ago

Nice, I'll take a look. Does it use the Float32 or Float16, both?

Models? int8_float32 model by default, if you want float16 model then type --model=large-v3-fp16.

BBC-Esq commented 1 year ago

Cool, I'll check the --help but thanks for the tip. How did you implement the large-v3 so quickly or is it a trade secret? I know the people at the Ctranslate2 github have been working on it, maybe they solved it and you implemented it? I'd like to use the large-v3 in a python script, not CLI, but if you did it a proprietary way I can respect that...

manjunath7472 commented 1 year ago

Standalone Faster-Whisper r160.3 now supports large-v3, only for Windows atm.

Cannot find any executables files here.

User1231300 commented 1 year ago

@User1231300 my fork https://github.com/bungerr/faster-whisper-3 should work in the meantime while we work on getting #548 merged
model = WhisperModel(
    'large-v3', device="cuda", compute_type="float16"
)

Hey thank you very much for this. I noticed it's english only. How can i make it work for other languages too? I need Italian

Purfview commented 1 year ago

Standalone Faster-Whisper r160.3 now supports large-v3, only for Windows atm.

Cannot find any executables files here.

Executables are in Releases, it's at the right side of the page.

jnnnnn commented 11 months ago

SYSTRAN / faster-whisper

Will it be possible to use the large-v3 model? #544

Windows

Linux/Mac

578 has implemented v3