README outdated? - Githubissues

Describe the bug

I'm trying some examples but they don't work for me, not sure if it's due to configuration issue on my side or README degradation, eg start llm server or check langchain integration.

To reproduce

Start LLM Server

Steps

python3.10 -m venv .venv; \
> source .venv/bin/activate; \
> pip install openllm; \
> openllm start facebook/opt-1.3b

Error

It is recommended to specify the backend explicitly. Cascading backend might lead to unexpected behaviour.
Traceback (most recent call last):
....
    llm = openllm.LLM[t.Any, t.Any](
  File "/usr/lib/python3.10/typing.py", line 957, in __call__
    result = self.__origin__(*args, **kwargs)
  File "/home/sauron/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm/_llm.py", line 205, in __init__
    quantise=getattr(self._Quantise, backend)(self, quantize),
TypeError: getattr(): attribute name must be string

Fix

pip install openllm[vllm]
openllm start facebook/opt-1.3b --backend vllm

LangChain integration

Steps

pip install langchain; \
cat <<EOF > openllm-langchain.py
from langchain.llms import OpenLLM

llm = OpenLLM(model_name='llama', model_id='meta-llama/Llama-2-7b-hf')

llm('What is the difference between a duck and a goose? And why there are so many Goose in Canada?')
EOF
python openllm-langchain.py

Error

NOT RECOMMENDED in production and SHOULD ONLY used for development.
...
Traceback (most recent call last):
...
    res = self._runner(prompt, **config.model_dump(flatten=True))
TypeError: 'LlamaRunner' object is not callable

Fix

Unknown

Logs

No response

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.10 python: 3.10.13 platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.31 uid_gid: 1000:1000

pip_packages

``` accelerate==0.25.0 aiohttp==3.9.1 aioprometheus==23.3.0 aiosignal==1.3.1 anyio==3.7.1 appdirs==1.4.4 asgiref==3.7.2 async-timeout==4.0.3 attrs==23.1.0 bentoml==1.1.10 bitsandbytes==0.41.3.post2 build==0.10.0 cattrs==23.1.2 certifi==2023.11.17 charset-normalizer==3.3.2 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 coloredlogs==15.0.1 contextlib2==21.6.0 cuda-python==12.3.0 dataclasses-json==0.6.3 datasets==2.15.0 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.7 distlib==0.3.8 distro==1.8.0 einops==0.7.0 exceptiongroup==1.2.0 fastapi==0.105.0 fastcore==1.5.29 filelock==3.13.1 filetype==1.2.0 frozenlist==1.4.1 fs==2.4.16 fsspec==2023.10.0 ghapi==1.0.4 greenlet==3.0.2 grpcio==1.60.0 h11==0.14.0 httpcore==1.0.2 httptools==0.6.1 httpx==0.25.2 huggingface-hub==0.19.4 humanfriendly==10.0 idna==3.6 importlib-metadata==6.11.0 inflection==0.5.1 Jinja2==3.1.2 jsonpatch==1.33 jsonpointer==2.4 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 langchain==0.0.350 langchain-community==0.0.3 langchain-core==0.1.1 langsmith==0.0.71 markdown-it-py==3.0.0 MarkupSafe==2.1.3 marshmallow==3.20.1 mdurl==0.1.2 mpmath==1.3.0 msgpack==1.0.7 multidict==6.0.4 multiprocess==0.70.15 mypy-extensions==1.0.0 networkx==3.2.1 ninja==1.11.1.1 numpy==1.26.2 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==11.525.150 nvidia-nccl-cu12==2.18.1 nvidia-nvjitlink-cu12==12.3.101 nvidia-nvtx-cu12==12.1.105 openllm==0.4.40 openllm-client==0.4.40 openllm-core==0.4.40 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.16.1 orjson==3.9.10 packaging==23.2 pandas==2.1.4 pathspec==0.12.1 Pillow==10.1.0 pip-requirements-parser==32.0.1 pip-tools==7.3.0 platformdirs==4.1.0 prometheus-client==0.19.0 protobuf==4.25.1 psutil==5.9.6 pyarrow==14.0.1 pyarrow-hotfix==0.6 pydantic==1.10.13 Pygments==2.17.2 pyparsing==3.1.1 pyproject_hooks==1.0.0 python-dateutil==2.8.2 python-dotenv==1.0.0 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.2 quantile-python==1.1 ray==2.6.0 referencing==0.32.0 regex==2023.10.3 requests==2.31.0 rich==13.7.0 rpds-py==0.13.2 safetensors==0.4.1 schema==0.7.5 scipy==1.11.4 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 SQLAlchemy==2.0.23 starlette==0.27.0 sympy==1.12 tenacity==8.2.3 tokenizers==0.15.0 tomli==2.0.1 torch==2.1.2 tornado==6.4 tqdm==4.66.1 transformers==4.36.1 triton==2.1.0 typing-inspect==0.9.0 typing_extensions==4.9.0 tzdata==2023.3 urllib3==2.1.0 uvicorn==0.24.0.post1 uvloop==0.19.0 virtualenv==20.25.0 vllm==0.2.5 watchfiles==0.21.0 websockets==12.0 wrapt==1.16.0 xformers==0.0.23.post1 xxhash==3.4.1 yarl==1.9.4 zipp==3.17.0 ```

transformers version: 4.36.1
Platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.31
Python version: 3.10.13
Huggingface_hub version: 0.19.4
Safetensors version: 0.4.1
Accelerate version: 0.25.0
Accelerate config: not found
PyTorch version (GPU?): 2.1.2+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

System information (Optional)

No response

brand new virtualenv 3.10

The langchain I have a upstream PR for it.

Could you please share how you created the virtual environment? And the url of the PR in langchain to be able to track? Thanks

https://github.com/langchain-ai/langchain/pull/12968

venv () {
    name="${1:-venv}" 
    if [[ ! -d "$name" ]]
    then
        pip freeze | grep "virtualenv" &> /dev/null || pip install virtualenv
        python -m virtualenv "$name" --download
        source "$name/bin/activate"
        pip install "protobuf<3.20"
    else
        source "$name/bin/activate"
    fi
}

Thanks!

Regarding the installation is not working neither, this is the full stack:

It is recommended to specify the backend explicitly. Cascading backend might lead to unexpected behaviour.
Traceback (most recent call last):
  File "/home/user/projects/sandbox/test/.venv/bin/openllm", line 8, in <module>
    sys.exit(cli())
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 160, in wrapper
    return_value = func(*args, **attrs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 141, in wrapper
    return f(*args, **attrs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 366, in start_command
    llm = openllm.LLM[t.Any, t.Any](
  File "/usr/lib/python3.10/typing.py", line 957, in __call__
    result = self.__origin__(*args, **kwargs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm/_llm.py", line 205, in __init__
    quantise=getattr(self._Quantise, backend)(self, quantize),
TypeError: getattr(): attribute name must be string

Anyway, I can fix it with:

pip install openllm[vllm]

Apparently this is a problem on my side, so I close the issue.

Describe the bug

I'm trying some examples but they don't work for me, not sure if it's due to configuration issue on my side or README degradation, eg start llm server or check langchain integration.

To reproduce

Start LLM Server

Steps
python3.10 -m venv .venv; \
> source .venv/bin/activate; \
> pip install openllm; \
> openllm start facebook/opt-1.3b
Error
It is recommended to specify the backend explicitly. Cascading backend might lead to unexpected behaviour.
Traceback (most recent call last):
....
    llm = openllm.LLM[t.Any, t.Any](
  File "/usr/lib/python3.10/typing.py", line 957, in __call__
    result = self.__origin__(*args, **kwargs)
  File "/home/sauron/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm/_llm.py", line 205, in __init__
    quantise=getattr(self._Quantise, backend)(self, quantize),
TypeError: getattr(): attribute name must be string
Fix
pip install openllm[vllm]
openllm start facebook/opt-1.3b --backend vllm
LangChain integration

Steps
pip install langchain; \
cat <<EOF > openllm-langchain.py
from langchain.llms import OpenLLM

llm = OpenLLM(model_name='llama', model_id='meta-llama/Llama-2-7b-hf')

llm('What is the difference between a duck and a goose? And why there are so many Goose in Canada?')
EOF
python openllm-langchain.py
Error
NOT RECOMMENDED in production and SHOULD ONLY used for development.
...
Traceback (most recent call last):
...
    res = self._runner(prompt, **config.model_dump(flatten=True))
TypeError: 'LlamaRunner' object is not callable
Fix

Unknown

Logs

No response

Environment

Environment variable
BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''
System information

bentoml: 1.1.10 python: 3.10.13 platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.31 uid_gid: 1000:1000

pip_packages

transformers version: 4.36.1

Platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.31

Python version: 3.10.13

Huggingface_hub version: 0.19.4

Safetensors version: 0.4.1

Accelerate version: 0.25.0

Accelerate config: not found

PyTorch version (GPU?): 2.1.2+cu121 (True)

Tensorflow version (GPU?): not installed (NA)

Flax version (CPU?/GPU?/TPU?): not installed (NA)

Jax version: not installed

JaxLib version: not installed

Using GPU in script?: No

Using distributed or parallel set-up in script?: No

System information (Optional)

No response

Do you need CUDA to do pip install openllm[vllm]? I am getting the following error " Downloading vllm-0.2.6.tar.gz (167 kB) ---------------------------------------- 167.2/167.2 kB 2.5 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [23 lines of output] C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\torch\nn\modules\transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:84.) device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'), Traceback (most recent call last): File "C:\Users\13318\anaconda3\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 353, in main() File "C:\Users\13318\anaconda3\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "C:\Users\13318\anaconda3\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 118, in get_requires_for_build_wheel return hook(config_settings) File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\setuptools\build_meta.py", line 325, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=['wheel']) File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\setuptools\build_meta.py", line 295, in _get_build_requires self.run_setup() File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\setuptools\build_meta.py", line 311, in run_setup exec(code, locals()) File "", line 230, in File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\torch\utils\cpp_extension.py", line 1076, in CUDAExtension library_dirs += library_paths(cuda=True) File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\torch\utils\cpp_extension.py", line 1210, in library_paths paths.append(_join_cuda_home(lib_dir)) File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\torch\utils\cpp_extension.py", line 2416, in _join_cuda_home raise OSError('CUDA_HOME environment variable is not set. ' OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output."

bentoml / OpenLLM

README outdated? #787

Describe the bug

To reproduce

Start LLM Server

Steps

Error

Fix

LangChain integration

Steps

Error

Fix

Logs

Environment

Environment variable

System information

System information (Optional)

Describe the bug

To reproduce

Start LLM Server

Steps

Error

Fix

LangChain integration

Steps

Error

Fix

Logs

Environment

Environment variable

System information

System information (Optional)