Closed ErfolgreichCharismatisch closed 1 month ago
@ErfolgreichCharismatisch did you managed to translate large-v3? how?
I have the same issue. Iarge-v3 works as expected but whisper-large-v3-german stops the output after processing the first part. If you log it you can see that it processes the orther parts, too but it does not output the text:
'''
#first convert model like this
ct2-transformers-converter --force --model primeline/whisper-large-v3-german --output_dir C:\opt\whisper-large-v3-german --copy_files special_tokens_map.json tokenizer_config.json preprocessor_config.json vocab.json added_tokens.json --quantization float16
'''
from faster_whisper import WhisperModel
import os
import logging
logging.basicConfig()
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)
model_size = "C:\opt\whisper-large-v3-german"
#model_size = "large-v3"
# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")
# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")
segments, info = model.transcribe(os.path.dirname(__file__) + "/audio.mp3", beam_size=5, word_timestamps=False)
print("Detected language '%s' with probability %f" % (info.language, info.language_probability))
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
Try:
import ctranslate2
from ctranslate2.converters import TransformersConverter
model_name_or_path = "primeline/whisper-large-v3-german "
output_dir = "whisper-large-v3-german --copy_files"
converter = TransformersConverter(model_name_or_path)
converter.convert(output_dir, quantization="float16", force=True)
small corrections:
import ctranslate2
from ctranslate2.converters import TransformersConverter
model_name_or_path = "primeline/whisper-large-v3-german"
output_dir = "whisper-large-v3-german"
converter = TransformersConverter(model_name_or_path)
converter.convert(output_dir, quantization="float16", force=True)
but no difference in result
Any news on this? Ran into the same issue trying to convert this particular model to implement it in WhisperX
Also interested
@ErfolgreichCharismatisch Your issue is that German model is in bfloat16. The support was added here: https://github.com/OpenNMT/CTranslate2/issues/1121
Instead of float16 quantization use bfloat16 or int8_bfloat16
Models that are trained with bfloat16 can have numerical issues when run with float16
@ErfolgreichCharismatisch by the way someone already converted german: https://huggingface.co/GalaktischeGurke/primeline-whisper-large-v3-german-ct2 https://huggingface.co/flozi00/whisper-large-v3-german-ct2
@ErfolgreichCharismatisch by the way someone already converted german: https://huggingface.co/GalaktischeGurke/primeline-whisper-large-v3-german-ct2 https://huggingface.co/flozi00/whisper-large-v3-german-ct2
They all suffer from the same premature ending. I used both with
model = WhisperModel(model_name_or_path, device="cuda", compute_type="bfloat16")
and
model = WhisperModel(model_name_or_path, device="cuda", compute_type="int8_bfloat16")
I converted https://huggingface.co/primeline/whisper-large-v3-turbo-german/tree/main using
ct2-transformers-converter --model primeline/whisper-large-v3-turbo-german --output_dir "some/dir" --quantization float16 --force
same premature ending.
only
model_name_or_path = "large-v3"
model = WhisperModel(model_name_or_path, device="cuda", compute_type="float16")
works.
@ErfolgreichCharismatisch Yes, I am experimenting with whisper-large-v3-vi and have same problem. I tried to keep it float32 as original. Quantize to float16 and also copy the missing tokenizer.json before converting.
Nothing helps and it crashes on some exception during inference :-/
I try to debug it, it only writes: "list index out of range" in the console. (that will be from my boilerplate code around faster-whisper).
@trungkienbkhn Anh ơi, I want to use vietnamese with higher precision, would you know why we cannot convert this? I see you are one of maintainers. Cảm ơn!!!
I guess the converter is just faulty. Did you try the supplied source code for some models by the model creators?
from transformers import WhisperForConditionalGeneration, WhisperTokenizer
from ctranslate2.converters import TransformersConverter
model_name_or_path = "Your model name or path here"
output_dir = "where do you want to store your model in memory"
#Manually download your model from huggingface
model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path, cache_dir=output_dir)
del model.config.max_length
del model.config.begin_suppress_tokens
converter = TransformersConverter(output_dir)
converter.convert(output_dir, quantization="float32", force=True)
Once the model is saved into the output directory you desire, add the tokenizer, special_tokens file and preprocessor_config.json files into the stored location which you will find in the base model’s location which in my case was https://huggingface.co/openai/whisper-large-v3, before integrating it with the whisperX pipeline.
Try this and tell me if this works.
Same premature ending.
This works based on the example provided: https://huggingface.co/cstr/whisper-large-v3-turbo-int8_float32
# Import necessary modules
from faster_whisper import WhisperModel
from faster_whisper.transcribe import BatchedInferencePipeline
# Initialize the model
model = WhisperModel("cstr/whisper-large-v3-turbo-int8_float32", device="auto", compute_type="int8")
batched_model = BatchedInferencePipeline(model=model)
segments, info = batched_model.transcribe("audio.wav", batch_size=16)
# Print transcription
for segment in segments:
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
But it is unclear, why. Could be the newest version not in pip or conda but cloned from here.
Also, I want to mention that this batched version only accepts wav/mp3 while the non-batched version accepts also wma, which is what I was using.
@ErfolgreichCharismatisch What do you mean? The code you provided does not convert anything and faster-whisper-large-v3-turbo-ct2 is already available here https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 !
I cloned latest /SYSTRAN/faster-whisper repository already a few days ago and still encounter the issues. Can you please elaborate more? Thanks.
How can I convert https://huggingface.co/primeline/whisper-large-v3-german to be used with faster-whisper?
Also, can faster-whisper use safetensors and can I convert the above to it?
EDIT: When using
ct2-transformers-converter --model primeline/whisper-large-v3-german --output_dir whisper-large-v3-german --copy_files tokenizer.json --quantization float16
I get
ValueError: Non-consecutive added token '<|0.02|>' found. Should have index 50365 but has index 50366 in saved vocabulary
After upgrading ctranslate and transformers it works.
I got it to work with model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)
Yet the model stops after about 20 words, whereas the large-v2 does the whole file. No error message, but a freezing tqdm as step 1 to skip to the end that I wrapped around for segment in segments: