MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
2.53k stars 243 forks source link

Danish language support #134

Closed kasperhk closed 2 months ago

kasperhk commented 8 months ago

Request for danish language support

vladgrand2 commented 8 months ago

Add to diarize.py:

from whisperx.utils import LANGUAGES, TO_LANGUAGE_CODE

parser = argparse.ArgumentParser()

parser.add_argument(
    "--language", 
    type=str, 
    default=None, 
    choices=sorted(LANGUAGES.keys()) + sorted([k.title() for k in TO_LANGUAGE_CODE.keys()]),
    help="Language spoken in the audio, specify None to perform language detection"
)
args = parser.parse_args()

language = args.language

run diarize.py --language da

PiotrEsse commented 7 months ago

It seems the code is already there, but its not working with different languages except english, for me it seems.

MahmoudAshraf97 commented 7 months ago

It seems the code is already there, but its not working with different languages except english, for me it seems.

make sure that you are using a multilingual model, the default model is english only