Converting fine-tuned whisper model for faster-whisper, using safetensors

ErfolgreichCharismatisch commented 1 year ago

How can I convert https://huggingface.co/primeline/whisper-large-v3-german to be used with faster-whisper?

Also, can faster-whisper use safetensors and can I convert the above to it?

EDIT: When using

ct2-transformers-converter --model primeline/whisper-large-v3-german --output_dir whisper-large-v3-german --copy_files tokenizer.json --quantization float16

I get

ValueError: Non-consecutive added token '<|0.02|>' found. Should have index 50365 but has index 50366 in saved vocabulary

After upgrading ctranslate and transformers it works.

I got it to work with model.feature_extractor.mel_filters = model.feature_extractor.get_mel_filters(model.feature_extractor.sampling_rate, model.feature_extractor.n_fft, n_mels=128)

Yet the model stops after about 20 words, whereas the large-v2 does the whole file. No error message, but a freezing tqdm as step 1 to skip to the end that I wrapped around for segment in segments:

aliuspetraska commented 11 months ago

@ErfolgreichCharismatisch did you managed to translate large-v3? how?

progral commented 10 months ago

I have the same issue. Iarge-v3 works as expected but whisper-large-v3-german stops the output after processing the first part. If you log it you can see that it processes the orther parts, too but it does not output the text:

'''
#first convert model like this

ct2-transformers-converter --force --model primeline/whisper-large-v3-german --output_dir C:\opt\whisper-large-v3-german --copy_files special_tokens_map.json tokenizer_config.json preprocessor_config.json vocab.json added_tokens.json --quantization float16
'''

from faster_whisper import WhisperModel
import os
import logging

logging.basicConfig()
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)

model_size = "C:\opt\whisper-large-v3-german"
#model_size = "large-v3"

# Run on GPU with FP16
model = WhisperModel(model_size, device="cuda", compute_type="float16")

# or run on GPU with INT8
# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")
# or run on CPU with INT8
# model = WhisperModel(model_size, device="cpu", compute_type="int8")

segments, info = model.transcribe(os.path.dirname(__file__) + "/audio.mp3", beam_size=5, word_timestamps=False)

print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

Swami-Abhinav commented 10 months ago

Try:

import ctranslate2
from ctranslate2.converters import TransformersConverter

model_name_or_path = "primeline/whisper-large-v3-german "
output_dir = "whisper-large-v3-german --copy_files"

converter = TransformersConverter(model_name_or_path)
converter.convert(output_dir, quantization="float16", force=True)

ErfolgreichCharismatisch commented 10 months ago

small corrections:

import ctranslate2
from ctranslate2.converters import TransformersConverter

model_name_or_path = "primeline/whisper-large-v3-german"
output_dir = "whisper-large-v3-german"

converter = TransformersConverter(model_name_or_path)
converter.convert(output_dir, quantization="float16", force=True)

but no difference in result

eplinux commented 8 months ago

Any news on this? Ran into the same issue trying to convert this particular model to implement it in WhisperX

Bardo-Konrad commented 2 months ago

Also interested

cyberluke commented 1 month ago

@ErfolgreichCharismatisch Your issue is that German model is in bfloat16. The support was added here: https://github.com/OpenNMT/CTranslate2/issues/1121

Instead of float16 quantization use bfloat16 or int8_bfloat16

Models that are trained with bfloat16 can have numerical issues when run with float16

cyberluke commented 1 month ago

@ErfolgreichCharismatisch by the way someone already converted german: https://huggingface.co/GalaktischeGurke/primeline-whisper-large-v3-german-ct2 https://huggingface.co/flozi00/whisper-large-v3-german-ct2

ErfolgreichCharismatisch commented 1 month ago

@ErfolgreichCharismatisch by the way someone already converted german: https://huggingface.co/GalaktischeGurke/primeline-whisper-large-v3-german-ct2 https://huggingface.co/flozi00/whisper-large-v3-german-ct2

They all suffer from the same premature ending. I used both with

model = WhisperModel(model_name_or_path, device="cuda", compute_type="bfloat16")

and

model = WhisperModel(model_name_or_path, device="cuda", compute_type="int8_bfloat16")

I converted https://huggingface.co/primeline/whisper-large-v3-turbo-german/tree/main using

ct2-transformers-converter --model primeline/whisper-large-v3-turbo-german --output_dir "some/dir" --quantization float16 --force

same premature ending.

only

model_name_or_path = "large-v3"
model = WhisperModel(model_name_or_path, device="cuda", compute_type="float16")

works.

cyberluke commented 1 month ago

@ErfolgreichCharismatisch Yes, I am experimenting with whisper-large-v3-vi and have same problem. I tried to keep it float32 as original. Quantize to float16 and also copy the missing tokenizer.json before converting.

Nothing helps and it crashes on some exception during inference :-/

I try to debug it, it only writes: "list index out of range" in the console. (that will be from my boilerplate code around faster-whisper).

cyberluke commented 1 month ago

@trungkienbkhn Anh ơi, I want to use vietnamese with higher precision, would you know why we cannot convert this? I see you are one of maintainers. Cảm ơn!!!

ErfolgreichCharismatisch commented 1 month ago

I guess the converter is just faulty. Did you try the supplied source code for some models by the model creators?

Swami-Abhinav commented 1 month ago

from transformers import WhisperForConditionalGeneration, WhisperTokenizer
from ctranslate2.converters import TransformersConverter

model_name_or_path = "Your model name or path here"
output_dir = "where do you want to store your model in memory"

#Manually download your model from huggingface
model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path, cache_dir=output_dir)
del model.config.max_length
del model.config.begin_suppress_tokens

converter = TransformersConverter(output_dir)
converter.convert(output_dir, quantization="float32", force=True)

Once the model is saved into the output directory you desire, add the tokenizer, special_tokens file and preprocessor_config.json files into the stored location which you will find in the base model’s location which in my case was https://huggingface.co/openai/whisper-large-v3, before integrating it with the whisperX pipeline.

Try this and tell me if this works.

ErfolgreichCharismatisch commented 1 month ago

Same premature ending.

ErfolgreichCharismatisch commented 1 month ago

This works based on the example provided: https://huggingface.co/cstr/whisper-large-v3-turbo-int8_float32

# Import necessary modules
from faster_whisper import WhisperModel
from faster_whisper.transcribe import BatchedInferencePipeline

# Initialize the model
model = WhisperModel("cstr/whisper-large-v3-turbo-int8_float32", device="auto", compute_type="int8")
batched_model = BatchedInferencePipeline(model=model)

segments, info = batched_model.transcribe("audio.wav", batch_size=16)

# Print transcription
for segment in segments:
    print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))

But it is unclear, why. Could be the newest version not in pip or conda but cloned from here.

Also, I want to mention that this batched version only accepts wav/mp3 while the non-batched version accepts also wma, which is what I was using.

cyberluke commented 1 month ago

@ErfolgreichCharismatisch What do you mean? The code you provided does not convert anything and faster-whisper-large-v3-turbo-ct2 is already available here https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2 !

I cloned latest /SYSTRAN/faster-whisper repository already a few days ago and still encounter the issues. Can you please elaborate more? Thanks.

SYSTRAN / faster-whisper

Converting fine-tuned whisper model for faster-whisper, using safetensors #567