bentoml list Tag Size Model Size Creation Time mymodel-service:12345 56.29 KiB XX GiB 2024-02-12 13:11:19
Problem:
I can't pass these values into service! even the envoiment vaiable MAX_MODEL_LEN is not reflected to the llm._max_model_len. Also I tried to change the bento.yaml file and then bentoml serve the service however still the problem there can't reflect this value to llm._max_model_len.
I have the following the same config on the bentoml path:
Describe the bug
I can specify openLLM configuration by using start command:
CUDA_VISIBLE_DEVICES=0,1 TRANSFORMERS_OFFLINE=1 openllm start mistral --model-id mymodel --dtype float16 --gpu-memory-utilization 0.95 --workers-per-resource 0.5
However, I can't change the values when I tried to use the service later on. I mean after openllm start then build
openllm build mymodel --backend vllm --serialization safetensors
bentoml containerize mymodel-service:12345 --opt progress=plain
I have following bento service:
bentoml list Tag Size Model Size Creation Time mymodel-service:12345 56.29 KiB XX GiB 2024-02-12 13:11:19
Problem: I can't pass these values into service! even the envoiment vaiable
MAX_MODEL_LEN
is not reflected to thellm._max_model_len
. Also I tried to change the bento.yaml file and then bentoml serve the service however still the problem there can't reflect this value tollm._max_model_len
.I have the following the same config on the bentoml path:
bentoml/bentos/mymodel-service/12345/bento.yaml
$ bentoml get mymodel-service:12345
To reproduce
No response
Logs
No response
Environment
$
bentoml -v
bentoml, version 1.1.11$
openllm -v
openllm, 0.4.45.dev2 (compiled: False) Python (CPython) 3.11.7System information (Optional)
No response