Closed finnvoor closed 6 months ago
Seems like we can't just work around this using modelFolder since loadTokenizer
also does a request to hf.
swift-transformers maintainer here, thanks for the report. Some Hugging Face libraries use an environment variable to prevent http requests if a local cache folder exists for the requested model, would that solution work for you? I'll see if we can also augment the public API to indicate this behaviour.
Some Hugging Face libraries use an environment variable to prevent http requests if a local cache folder exists for the requested model.
Something like that should work, really just any way we can manually specify the path if we already know it so it doesn't require any http requests.
There is one caveat with looking for the file's existence in that it also has to be fully downloaded to use, which isn't always the case if a download fails, and could get you in a bad state where it thinks the file are there but actually they're just half downloaded. The logic in the example app right now actually tries to load the model and then only redownloads it if loadModel throws an error. This is something we can bring into the main repo for the tokenizer, and in addition to @pcuenca's suggestion, this should get us back to having offline working again.
Also if you haven't seen it yet @jkrukowski added a new param on loadTokenizer in this PR (and in both repos) that lets you specify a folder, which won't skip the download currently, but will with the proposed changes from swift-transformers, and theoretically you could package the tokenizer files with your app and pass in that path for example.
added a small cleanup PR which might take us closer to solving this problem https://github.com/argmaxinc/WhisperKit/pull/84
HuggingFace is currently down meaning WhisperKit doesn't work at the moment 😅, would be nice to avoid this
HuggingFace down again, adding more reason to do this :)
Indeed 😢 will close this out shortly bringing in @jkrukowski's work on it.
Currently when using the default WhisperKit flow of auto downloading models on transcribe, an internet connection is required even if models have already been downloaded in the past due to swift-transformers fetching the filenames here.
This is a bit limiting, as e.g. @pveugen was on a train with poor internet and couldn't transcribe audio even after downloading the model in the past (after #80 it would throw an error instead of crashing). I think we could get around this by manually downloading and specifying the path in
setupModels modelFolder:
, but it would be nice if there was a way to avoid this HTTP get by default.