Vaibhavs10 / insanely-fast-whisper

Apache License 2.0
7.76k stars 547 forks source link

segmentation and diarization #238

Open danielbowne opened 3 months ago

danielbowne commented 3 months ago

For people working with potentially sensitive audio/data, how can we handle segmentation and diarization locally vs relying on hf API calls? is this an option and I am overlooking? Great tool by the way, I am very impressed with my initial tests.

WilliamBonvini commented 2 months ago

I haven't gotten the chance to read the codebase yet, but I've been able to transcribe and diarize and audio file with no internet connection, so I suppose the token is not used during inference, but maybe only for downloading the model. I might be wrong though.

Moreover, I suggest you to give a more meaningful name to the issue: something like "local segmentation an diarization", or "privacy concerns for segmentation and diarization".

anujbohra23 commented 1 month ago

I haven't gotten the chance to read the codebase yet, but I've been able to transcribe and diarize and audio file with no internet connection, so I suppose the token is not used during inference, but maybe only for downloading the model. I might be wrong though.

Moreover, I suggest you to give a more meaningful name to the issue: something like "local segmentation an diarization", or "privacy concerns for segmentation and diarization".

Hello, I wanted to test the diarization process on local machine but it is not working for some reason Could you please provide the code for the same?

WilliamBonvini commented 1 month ago

@anujbohra23 sorry but I don't have the code snippet anymore. I used insanely-fast-whisper's CLI commands to achieve it. It would be best if you made explicit what exactly is not working for someone to help you out.