aarnphm / whispercpp

Pybind11 bindings for Whisper.cpp
Apache License 2.0
320 stars 57 forks source link

from_pretrained load local model #126

Open stoneLee81 opened 1 year ago

stoneLee81 commented 1 year ago

Describe the bug

code is w = Whisper.from_pretrained('/Users/haowmazs/testdata/whisper.cpp-master/models/ggml-medium.bin')

throw exception RuntimeError: '/Users/haowmazs/testdata/whisper.cpp-master/models/ggml-medium.bin' is not a valid preconverted model. Choose one of ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large']

To reproduce

No response

Expected behavior

No response

Environment

Python 3.9.7 whispercpp 0.017

lostz commented 1 year ago

same sa i

shogunpurple commented 1 year ago

Seeing the same. At a glance looks like the problem is this piece of code here:

 if model_name not in utils.MODELS_URL and not _os.path.isfile(model_name):
            raise RuntimeError(
                f"'{model_name}' is not a valid preconverted model or a file path. \
                    Choose one of {list(utils.MODELS_URL)}"
            )

The second part of this conditional appears to be failing. It's odd because I tried using the same absolute path with os.path.isfile in a local python shell and it works - even if you import os as _os.

Not a huge issue anyway as you can just download the model with the predefined ones in MODELS_URL but hopefully this adds some context.

OS details in case related: OS: OSX Ventura 13.0.1 Python version: 3.10.0

EDIT: Cache Workaround

In the meantime, you can just populate the cache yourself with your own model.

This library looks in one of two directories for the models based on the existence of the XDG_DATA_HOME env variable. You can put your local model into this directory and it should work as expected.

# move your model into the cache
cp whisper.cpp/models/ggml-base.bin ~/.local/share/whispercpp # OR $XDG_DATA_HOME/whispercpp
from whispercpp import Whisper

# use the local dir with your pretrained whisper model
whisper = Whisper.from_pretrained("base")
marcoacierno commented 1 year ago

if anyone still has this issue, it should be because the version on pypi is older than what's on the repo (I noticed the error message is different!)

https://pypi.org/project/whispercpp/#files -> the source code only has

      if model_name not in utils.MODELS_URL:
            raise RuntimeError(
                f"'{model_name}' is not a valid preconverted model. Choose one of {list(utils.MODELS_URL)}"
            )

I fixed the error by installing from git

pip install git+https://github.com/aarnphm/whispercpp.git -vv