bentoml / OpenLLM

Run any open-source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
10.05k stars 636 forks source link

bug: Model is not found in BentoML store, you may need to run `bentoml models pull` first #229

Closed Lukikay closed 1 year ago

Lukikay commented 1 year ago

Describe the bug

Hi there, thanks for providing this brilliant work!

I cannot run Baichuan-13B-Chat model successfully, it said the model is not found in BentoML store <osfs '/root/bentoml/models'>, you may need to run `bentoml models pull` first

However, I found the safetensors files were already generated in /root/bentoml/models

Thanks in advance

To reproduce

openllm start baichuan --model-id baichuan-inc/Baichuan-13B-Chat --device 0,1 --debug

Logs

root@GPU11:/Working# openllm start baichuan --model-id baichuan-inc/Baichuan-13B-Chat --device 0,1 --debug
2023-08-17T09:43:00+0000 [DEBUG] [cli] Importing service "_service:svc" from working dir: "/usr/local/lib/python3.10/site-packages/openllm"
Error: [bentoml-cli] `serve` failed: Model 'pt-baichuan-inc-baichuan-13b-chat:a4a558127068f2ce965aa56aeb826bf501a68970' is not found in BentoML store <osfs '/root/bentoml/models'>, you may need to run `bentoml models pull` first
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/bentoml/__main__.py", line 4, in <module>
    cli()
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bentoml_cli/utils.py", line 367, in wrapper
    raise err from None
  File "/usr/local/lib/python3.10/site-packages/bentoml_cli/utils.py", line 362, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bentoml_cli/utils.py", line 333, in wrapper
    return_value = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bentoml_cli/utils.py", line 290, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bentoml_cli/env_manager.py", line 122, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/bentoml_cli/serve.py", line 260, in serve
    serve_http_production(
  File "/usr/local/lib/python3.10/site-packages/simple_di/__init__.py", line 139, in _
    return func(*_inject_args(bind.args), **_inject_kwargs(bind.kwargs))
  File "/usr/local/lib/python3.10/site-packages/bentoml/serve.py", line 278, in serve_http_production
    svc = load(bento_identifier, working_dir=working_dir)
  File "/usr/local/lib/python3.10/site-packages/bentoml/_internal/service/loader.py", line 374, in load
    svc = import_service(
  File "/usr/local/lib/python3.10/site-packages/simple_di/__init__.py", line 139, in _
    return func(*_inject_args(bind.args), **_inject_kwargs(bind.kwargs))
  File "/usr/local/lib/python3.10/site-packages/bentoml/_internal/service/loader.py", line 137, in import_service
    module = importlib.import_module(module_name, package=working_dir)
  File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.10/site-packages/openllm/_service.py", line 22, in <module>
    runner = openllm.Runner(model, llm_config=llm_config, ensure_available=False, adapter_map=orjson.loads(adapter_map))
  File "/usr/local/lib/python3.10/site-packages/openllm/_llm.py", line 1007, in Runner
    runner = infer_auto_class(implementation).create_runner(model_name, llm_config=llm_config, ensure_available=ensure_available if ensure_available is not None else init_local, **attrs)
  File "/usr/local/lib/python3.10/site-packages/openllm/models/auto/factory.py", line 50, in create_runner
    return cls.for_model(model, model_id=model_id, **attrs).to_runner(**runner_attrs)
  File "/usr/local/lib/python3.10/site-packages/openllm/_llm.py", line 922, in to_runner
    raise RuntimeError(f"Failed to locate {self._bentomodel}:{err}") from None
  File "/usr/local/lib/python3.10/site-packages/openllm/_llm.py", line 780, in _bentomodel
    if self.__llm_bentomodel__ is None: self.__llm_bentomodel__ = openllm.serialisation.get(self)
  File "/usr/local/lib/python3.10/site-packages/openllm/serialisation/__init__.py", line 77, in caller
    return getattr(importlib.import_module(f".{llm.runtime}", __name__), fn)(llm, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/openllm/serialisation/transformers/__init__.py", line 140, in get
    raise err from None
  File "/usr/local/lib/python3.10/site-packages/openllm/serialisation/transformers/__init__.py", line 132, in get
    model = bentoml.models.get(llm.tag)
  File "/usr/local/lib/python3.10/site-packages/simple_di/__init__.py", line 139, in _
    return func(*_inject_args(bind.args), **_inject_kwargs(bind.kwargs))
  File "/usr/local/lib/python3.10/site-packages/bentoml/models.py", line 45, in get
    return _model_store.get(tag)
  File "/usr/local/lib/python3.10/site-packages/bentoml/_internal/store.py", line 146, in get
    raise NotFound(
bentoml.exceptions.NotFound: Model 'pt-baichuan-inc-baichuan-13b-chat:a4a558127068f2ce965aa56aeb826bf501a68970' is not found in BentoML store <osfs '/root/bentoml/models'>, you may need to run `bentoml models pull` first

🚀 Next step: run 'openllm build baichuan' to create a Bento for baichuan

Environment

Python: 3.10.12 CUDA: 11.2.2 openllm: 0.2.25 bentoml: 1.1.1

System information (Optional)

RAM: 256G GPU: 4 * RTX 3090 running in docker container

aarnphm commented 1 year ago

One side note is only CUDA 11.8 is fully supported

Recently our scheduling strategy change such that we will only create 1 runner instance regardless of the device. This option is controlled via --workers-per-resource

--device indicate what GPU is available to the Runner. --workers-per-resource is probably what you want here

Lukikay commented 1 year ago

Hi, thanks for your suggestion. I tried on another machine (RTX 4090 with CUDA 11.8), unfortunately, I got the same error Error: [bentoml-cli] "serve" failed: Model 'pt-baichuan-inc-baichuan-13b-chat:a4a558127068f2ce965aa56aeb826bf501a68970' is not found in BentoML store <osfs '/root/bentoml/models'>, you may need to run "bentoml models pull" first

aarnphm commented 1 year ago

Can you show the whole stack trace?