SYSTRAN / faster-whisper

Faster Whisper transcription with CTranslate2
MIT License
12.53k stars 1.05k forks source link

Can I serve speechbrain trained model whisper with faster whisper? #1139

Open cod3r0k opened 5 days ago

cod3r0k commented 5 days ago

Can I serve speechbrain trained model whisper with faster whisper?

MahmoudAshraf97 commented 5 days ago

You have to convert it to CT2 first, there are several converters available, you can check CT2 documentation for more information

cod3r0k commented 5 days ago

Great, can you help me more? What is CT2? @MahmoudAshraf97

MahmoudAshraf97 commented 5 days ago

The backend of Faster Whisper https://github.com/OpenNMT/CTranslate2/

cod3r0k commented 4 days ago

Great. You mean that I do as below:

like transformers

#First, load the SpeechBrain Whisper model and extract its weights.
from transformers import WhisperProcessor, WhisperForConditionalGeneration
processor = WhisperProcessor.from_pretrained("openai/whisper-large")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")

# Save the model in Hugging Face format
model.save_pretrained("whisper_huggingface")
processor.save_pretrained("whisper_huggingface")

we do in SB

from speechbrain.pretrained import WhisperASR
whisper = WhisperASR.from_hparams(source="speechbrain/whisper-large", savedir="tmp_whisper")
# Save model weights
model = whisper.modules.model
torch.save(model.state_dict(), "speechbrain_whisper_weights.pth")

from transformers import WhisperForConditionalGeneration
# Load the Hugging Face Whisper model
hf_whisper = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")

#map weight
import torch

# Load SpeechBrain weights
speechbrain_weights = torch.load("speechbrain_whisper_weights.pth")

# Load Hugging Face model weights
hf_model_state_dict = hf_whisper.state_dict()

# Map weights from SpeechBrain to Hugging Face
mapped_weights = {}
for name, param in hf_model_state_dict.items():
    # Replace this mapping logic with the exact alignment of layers
    if name in speechbrain_weights:
        mapped_weights[name] = speechbrain_weights[name]
    else:
        mapped_weights[name] = param  # Use original HF weights if no match

# Update Hugging Face model with the mapped weights
hf_whisper.load_state_dict(mapped_weights)

# Save the updated model
hf_whisper.save_pretrained("hf_whisper_converted")

#verify
from transformers import WhisperProcessor
processor = WhisperProcessor.from_pretrained("openai/whisper-large")
audio_path = "path_to_audio.wav"
inputs = processor(audio_path, return_tensors="pt", sampling_rate=16000)
generated_ids = hf_whisper.generate(**inputs)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)

print(f"Transcription: {transcription}")

Then

ctranslate2-converter --model hf_whisper_converted --output_dir whisper_ctranslate2 --quantization

Do I do it correctly?

MahmoudAshraf97 commented 4 days ago

Exactly, if the model you have in not in huggingface format, you need to convert it first to that format then to CT2