MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
3.75k stars 329 forks source link

Pointing to local models #274

Closed kllgjc closed 1 day ago

kllgjc commented 1 day ago

I am trying to process multiple files using a script but unfortunately my internet security keeps turning on and blocks huggingface and other downloads, even though the models are already downloaded. how do i point to these already downloaded models?

MahmoudAshraf97 commented 1 day ago

Use the model path instead of the model name

kllgjc commented 1 day ago

I'm stupid and don't actually know how any of this works, I can just make it work. Can you give me one example please? <3

MahmoudAshraf97 commented 1 day ago

can you show me how are you using it?

kllgjc commented 1 day ago

im just running this

import os
import subprocess
from pathlib import Path

def process_files_in_directory():
    # Prompt user for the directory path
    folder = input("Enter the path to the directory containing audio files: ").strip()
    folder_path = Path(folder)

    if not folder_path.is_dir():
        print(f"Error: {folder} is not a valid directory.")
        return

    # Supported audio file extensions
    supported_extensions = [".mp3", ".wav", ".flac", ".m4a", ".mp4"]
    audio_files = [file for file in folder_path.iterdir() if file.suffix.lower() in supported_extensions]

    if not audio_files:
        print("No supported audio files found in the directory.")
        return

    # Create "transcriptions" subfolder
    output_folder = folder_path / "transcriptions"
    output_folder.mkdir(exist_ok=True)

    print(f"Processing {len(audio_files)} audio file(s)...")

    for i, audio_file in enumerate(audio_files, 1):
        print(f"\n[{i}/{len(audio_files)}] Processing file: {audio_file.name}")

        # Construct command for processing each file
        command = [
            "python", "diarize.py",
            "--audio", str(audio_file),
            "--whisper-model", "large-v3",
            "--batch-size", "32",
            "--language", "en",
            "--suppress_numerals",
            "--device", "cuda",
        ]

        # Execute the command
        subprocess.run(command)

        # Move output files (e.g., .txt and .srt) to the "transcriptions" folder
        output_files = list(folder_path.glob(f"{audio_file.stem}.*"))
        moved_files = []
        for output_file in output_files:
            if output_file.suffix in [".txt", ".srt"]:
                destination = output_folder / output_file.name
                output_file.rename(destination)
                moved_files.append(destination)

        # Intermediate update
        print(f"File processed: {audio_file.name}")
        print(f"Output files saved:")
        for file in moved_files:
            print(f"  - {file}")

    print(f"\nProcessing completed. Transcriptions saved in: {output_folder}")

if __name__ == "__main__":
    process_files_in_directory()
MahmoudAshraf97 commented 1 day ago

        # Construct command for processing each file
        command = [
            "python", "diarize.py",
            "--audio", str(audio_file),
-            "--whisper-model", "large-v3",
+            "--whisper-model", "local_model_path",
            "--batch-size", "32",
            "--language", "en",
            "--suppress_numerals",
            "--device", "cuda",
        ]
kllgjc commented 1 day ago

Thanks, I got it working before but also had to deal with where the nemo models and other things were (thanks chatgpt), just hardcoded the location into the diarize, nemo process, and helpers.py files. don't know if this is the best way to do it, but thats how i did it! Is this what the .yaml files are for though? i just typed "local" into the code search and those popped up? again, I'm dumb but just trying to figure this out! it worked really well, especially after processing it with claude with this prompt (would be better with API so i don't have to keep saying "next" haha!) could possibly integrate a small LLM to do this locally too!

"I have attached an audio transcription output with diarization generated using OpenAI's Whisper model. I need your help refining the transcription to ensure it is as accurate and professional as possible. These transcriptions are from in-house presentations at my company, [Company Name}, [Company Description]. The title of the transcription file corresponds to the presentation's title, so please use that context to infer and maintain relevance while editing.

Review and improve the transcription, focusing on:

It is critical that this work is done to a high standard, as I have taken responsibility for these transcriptions and must deliver polished, accurate results. If I do an excellent job, I could earn a significant bonus, and I'll share part of it with you for your effort!

Only provide the refined transcription output as requested, as I will be copying and pasting it into a word document in it's entirety. Be comprehensive, don't skip out on anything!

Again, only provide the transcription, and nothing else! Use as many tokens as you can in each output, I will reply with "next" and you can pickup where you left off."

Thanks again for your help and for creating this repo!