bentoml / OpenLLM

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
9.74k stars 619 forks source link

Cannot use llama model offline #517

Closed jeremythuon closed 10 months ago

jeremythuon commented 11 months ago

Describe the bug

Hello,

I would like try openllm offline but I can't. For my test, I download huggyllama--llama-7b model with another computer with internet and I copy bento home to another computer. When I try to start with the command : HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 openllm start llama --model-id bentoml/models/pt-huggyllama--llama-7b/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16

I have an error: Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.22s/it] Error: [bentoml-cli] serve failed: Failed to generate a valid tag for llama with 'model_id=bentoml/models/pt-huggyllama--llama-7b/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/' (lookup to see its traceback): Can't load the configuration of 'bentoml/models/pt-huggyllama--llama-7b/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'bentoml/models/pt-huggyllama--llama-7b/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/' is the correct path to a directory containing a config.json file

Can you help me pls ?

Thanks

To reproduce

No response

Logs

No response

Environment

bentoml 1.1.7 openllm 0.3.9 openllm-client 0.3.9 openllm-core 0.3.9

System information (Optional)

No response

aarnphm commented 11 months ago

Can u try the following

OPENLLM_USE_LOCAL_LATEST=True openllm start llama —model-id huggyllama/llama-7b

?

jeremythuon commented 11 months ago

Hi @aarnphm I try to add your env but it doesn't work Openllm try to find config.json

SError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like huggyllama/llama-7b is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

I check with strace, openllm use the right path : stat("/home/outscale/bentoml/models/pt-huggyllama--llama-7b/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/model.yaml

(llm) [outscale@tata ~]$ ls -lah /home/outscale/bentoml/models/pt-huggyllama--llama-7b/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/                                                              total 13G
drwxr-xr-x 2 outscale outscale 4.0K Oct 20 08:55 .
drwxr-xr-x 3 outscale outscale   68 Oct 19 12:46 ..
-rw-r--r-- 1 outscale outscale   42 Oct 19 12:37 added_tokens.json
-rw-r--r-- 1 outscale outscale  594 Oct 20 08:51 config.json
-rw-r--r-- 1 outscale outscale  137 Oct 19 12:37 generation_config.json
-rw-r--r-- 1 outscale outscale  11K Oct 19 12:37 LICENSE
-rw-r--r-- 1 outscale outscale 9.3G Oct 19 12:46 model-00001-of-00002.safetensors
-rw-r--r-- 1 outscale outscale 3.3G Oct 19 12:39 model-00002-of-00002.safetensors
-rw-r--r-- 1 outscale outscale  27K Oct 19 12:37 model.safetensors.index.json
-rw-r--r-- 1 outscale outscale  951 Oct 19 12:46 model.yaml
-rw-r--r-- 1 outscale outscale  27K Oct 19 12:37 pytorch_model.bin.index.json
-rw-r--r-- 1 outscale outscale  411 Oct 19 12:37 special_tokens_map.json
-rw-r--r-- 1 outscale outscale  700 Oct 19 12:37 tokenizer_config.json
-rw-r--r-- 1 outscale outscale 1.8M Oct 19 12:37 tokenizer.json
-rw-r--r-- 1 outscale outscale 489K Oct 19 12:37 tokenizer.model

Do you have an idea ?

aarnphm commented 11 months ago

Do you pass in HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 as well? this might not work. But I will def take a look thanks for reporting it.

jeremythuon commented 11 months ago

It's the same result with or whitout variable Do you think it comes from huggingface?

yingjie-han commented 10 months ago

Do you pass in HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 as well? this might not work. But I will def take a look thanks for reporting it.

Hi, @aarnphm it is not working for me .Following is the log: $ HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 openllm start baichuan --model-id /home/yingjie/openllm/baichuan2-13b --backend pt Traceback (most recent call last): File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/serialisation/transformers/init.py", line 147, in get model = bentoml.models.get(llm.tag) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/simpledi/init.py", line 139, in return func(*_inject_args(bind.args), **_inject_kwargs(bind.kwargs)) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/bentoml/models.py", line 45, in get return _model_store.get(tag) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/bentoml/_internal/store.py", line 158, in get raise NotFound( bentoml.exceptions.NotFound: Model 'pt-baichuan2-13b:08c4d4d5d8625c6702b44beca2570febec83a4ae' is not found in BentoML store <osfs '/root/bentoml/models'>, you may need to run bentoml models pull first

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/cli/entrypoint.py", line 416, in import_command _ref = openllm.serialisation.get(llm) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/serialisation/init.py", line 75, in caller return getattr(importlib.import_module(f'.{serde}', name), fn)(llm, *args, **kwargs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/serialisation/transformers/init.py", line 155, in get raise openllm.exceptions.OpenLLMException(f'Failed while getting stored artefact (lookup for traceback):\n{err}') from err openllm_core.exceptions.OpenLLMException: Failed while getting stored artefact (lookup for traceback): Model 'pt-baichuan2-13b:08c4d4d5d8625c6702b44beca2570febec83a4ae' is not found in BentoML store <osfs '/root/bentoml/models'>, you may need to run bentoml models pull first

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/serialisation/transformers/init.py", line 147, in get model = bentoml.models.get(llm.tag) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/simpledi/init.py", line 139, in return func(*_inject_args(bind.args), **_inject_kwargs(bind.kwargs)) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/bentoml/models.py", line 45, in get return _model_store.get(tag) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/bentoml/_internal/store.py", line 158, in get raise NotFound( bentoml.exceptions.NotFound: Model 'pt-baichuan2-13b:08c4d4d5d8625c6702b44beca2570febec83a4ae' is not found in BentoML store <osfs '/root/bentoml/models'>, you may need to run bentoml models pull first

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/yingjie/openllm/v_openllm_bc/bin/openllm", line 8, in sys.exit(cli()) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/cli/entrypoint.py", line 196, in wrapper return_value = func(*args, *attrs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), args, kwargs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/cli/entrypoint.py", line 178, in wrapper return f(*args, attrs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), args, kwargs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/cli/_factory.py", line 179, in start_cmd llm = openllm.utils.infer_auto_class(env['backend_value']).for_model(model, File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/models/auto/factory.py", line 52, in for_model if ensure_available: llm.save_pretrained() File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/_llm.py", line 672, in save_pretrained return openllm.import_model(self.config['start_name'], File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/cli/_sdk.py", line 262, in _import_model return import_command.main(args=args, standalone_mode=False) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/cli/entrypoint.py", line 196, in wrapper return_value = func(*args, attrs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, *kwargs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/cli/entrypoint.py", line 178, in wrapper return f(args, attrs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/cli/entrypoint.py", line 422, in import_command _ref = openllm.serialisation.get(llm, auto_import=True) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/serialisation/init.py", line 75, in caller return getattr(importlib.import_module(f'.{serde}', name), fn)(llm, *args, kwargs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/serialisation/transformers/init.py", line 154, in get if auto_import: return import_model(llm, trust_remote_code=llm.trust_remote_code) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/simpledi/init.py", line 139, in return func(*_inject_args(bind.args), _inject_kwargs(bind.kwargs)) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/openllm/serialisation/transformers/init.py", line 94, in import_model tokenizer = infer_tokenizers_from_llm(llm).from_pretrained(llm.model_id, trust_remote_code=trust_remote_code, hub_attrs, *tokenizer_attrs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 738, in from_pretrained return tokenizer_class.from_pretrained(pretrained_model_name_or_path, inputs, kwargs) File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained return cls._from_pretrained( File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained tokenizer = cls(*init_inputs, **init_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/baichuan2-13b/tokenization_baichuan.py", line 71, in init super().init( File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/transformers/tokenization_utils.py", line 367, in init self._add_tokens( File "/home/yingjie/openllm/v_openllm_bc/lib64/python3.8/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens current_vocab = self.get_vocab().copy() File "/root/.cache/huggingface/modules/transformers_modules/baichuan2-13b/tokenization_baichuan.py", line 105, in get_vocab vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)} File "/root/.cache/huggingface/modules/transformers_modules/baichuan2-13b/tokenization_baichuan.py", line 101, in vocab_size return self.sp_model.get_piece_size() AttributeError: 'BaichuanTokenizer' object has no attribute 'sp_model'

aarnphm commented 10 months ago

will track this under #419

aarnphm commented 10 months ago

We have tested this and it seems to work for llama offline. Please try again. The baichuan model will tracked in a new thread instead.