Avoid requiring an internet connection to transcribe

argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon

http://argmaxinc.com/blog/whisperkit

MIT License

3.92k stars 330 forks source link

Avoid requiring an internet connection to transcribe #81

Closed finnvoor closed 6 months ago

finnvoor commented 8 months ago

Currently when using the default WhisperKit flow of auto downloading models on transcribe, an internet connection is required even if models have already been downloaded in the past due to swift-transformers fetching the filenames here.

This is a bit limiting, as e.g. @pveugen was on a train with poor internet and couldn't transcribe audio even after downloading the model in the past (after #80 it would throw an error instead of crashing). I think we could get around this by manually downloading and specifying the path in setupModels modelFolder:, but it would be nice if there was a way to avoid this HTTP get by default.

finnvoor commented 8 months ago

Seems like we can't just work around this using modelFolder since loadTokenizer also does a request to hf.

pcuenca commented 8 months ago

swift-transformers maintainer here, thanks for the report. Some Hugging Face libraries use an environment variable to prevent http requests if a local cache folder exists for the requested model, would that solution work for you? I'll see if we can also augment the public API to indicate this behaviour.

finnvoor commented 8 months ago

Some Hugging Face libraries use an environment variable to prevent http requests if a local cache folder exists for the requested model.

Something like that should work, really just any way we can manually specify the path if we already know it so it doesn't require any http requests.

ZachNagengast commented 8 months ago

There is one caveat with looking for the file's existence in that it also has to be fully downloaded to use, which isn't always the case if a download fails, and could get you in a bad state where it thinks the file are there but actually they're just half downloaded. The logic in the example app right now actually tries to load the model and then only redownloads it if loadModel throws an error. This is something we can bring into the main repo for the tokenizer, and in addition to @pcuenca's suggestion, this should get us back to having offline working again.

Also if you haven't seen it yet @jkrukowski added a new param on loadTokenizer in this PR (and in both repos) that lets you specify a folder, which won't skip the download currently, but will with the proposed changes from swift-transformers, and theoretically you could package the tokenizer files with your app and pass in that path for example.

jkrukowski commented 8 months ago

added a small cleanup PR which might take us closer to solving this problem https://github.com/argmaxinc/WhisperKit/pull/84

finnvoor commented 6 months ago

HuggingFace is currently down meaning WhisperKit doesn't work at the moment 😅, would be nice to avoid this

shawiz commented 6 months ago

HuggingFace down again, adding more reason to do this :)

ZachNagengast commented 6 months ago

Indeed 😢 will close this out shortly bringing in @jkrukowski's work on it.