MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
3.28k stars 272 forks source link

offline use #77

Closed carljmosca closed 11 months ago

carljmosca commented 1 year ago

Can diarize.py be run offline? I see where the faster-whisper project includes this option but I have not looked closely at NeMo but I assume it's possible.

MahmoudAshraf97 commented 1 year ago

If you have all the models cached it can run offline, otherwise you should run it one time online for it to work offline

carljmosca commented 1 year ago

It seems to try to download in any case. The behavior I have observed without and internet connection is a failure. With a connection, it seems to leverage the existing .cache files.

MahmoudAshraf97 commented 1 year ago

Nemo redownloads the cache after any update to the pip package so that might be the case, but I'm sure you can run it without internet because I've tried

carljmosca commented 1 year ago

I can get past the attempt of the download of the model by specifying the fully-qualified file name but I still run into this call attempting to reach the internet:

 msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to(args.device)

which ultimately results in this:

urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>

It does seem to work nicely (not doing but checking for downloads ) with an internet connection.

MahmoudAshraf97 commented 1 year ago

I see, I don't know how I missed this entirely as it tries to download the config file on each run, I'll think of a solution to cache the file or include it in the code

mrgreyigogg commented 1 year ago

also looking for fully offline use

MahmoudAshraf97 commented 11 months ago

Offline usage has been added