MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
2.44k stars 238 forks source link

What is the maximum audio file that can be sent for execution? #186

Open sh1man opened 1 month ago

sh1man commented 1 month ago

How many maximum seconds are the maximum seconds limit for a file ?

MahmoudAshraf97 commented 1 month ago

No Hard limit, depends on the ram and vram you are using

sh1man commented 1 month ago

why is VAD from Silero not used in this project?

MahmoudAshraf97 commented 1 month ago

Vad is used in transcription and diarization, if you are asking why not Silero specifically, it'll be difficult to replace the built-in VAD in these modules and there's no incentive to try doing that

sh1man commented 1 month ago

Whisper VAD integration https://github.com/ANonEntity/WhisperWithVAD/blob/main/WhisperWithVAD.ipynb

sh1man commented 1 month ago

At whisperX repository I saw information that they want to "Allow silero-vad as alternative VAD option"