argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.17k stars 268 forks source link

Expose downloadBase in WhisperKit init #57

Closed finnvoor closed 6 months ago

finnvoor commented 6 months ago

Allows setting a custom download location while still using the automatic model downloading.

finnvoor commented 6 months ago

This could maybe be replaced by downloading the model to modelFolder when download is true, but right now I think there is a mismatch between modelFolder and downloadBase (modelFolder contains the models, downloadBase contains models/argmaxinc/whisperkit-coreml), so it's a bit tricky to work with.

ZachNagengast commented 6 months ago

Yes it's a bit tricky I agree, will have to be careful with this because changing the location could easily create "orphan" models on the filesystem - downloadBase is relative to huggingface since that's where various other HF repos go if someone decides to use a different model than our pre-generated ones. This location's default (user documents folder) comes from the swift-transformers library, similar to how the python transformers places HF models in all the same place. The modelFolder is the specific folder within that repo that contains the models that actually get used. The apps should be fairly sheltered from having to deal with this because we return the local path upon download completion, so I think it makes most sense to allow changing the downloadBase, and letting that define the modelFolder location. We are also discussing whether we want to move all the variants into their own repos to match HF's typical way of storing models, open to input!

This PR looks good because it'll let folks set the downloadBase to any preferred location and is a good followup to #34 . In any case, since its just adding an optional default nil param, I don't expect it to cause much conflict with existing code.