bentoml / OpenLLM

Run any open-source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
10.05k stars 636 forks source link

README outdated? #787

Closed jmformenti closed 11 months ago

jmformenti commented 11 months ago

Describe the bug

I'm trying some examples but they don't work for me, not sure if it's due to configuration issue on my side or README degradation, eg start llm server or check langchain integration.

To reproduce

Start LLM Server

Steps

python3.10 -m venv .venv; \
> source .venv/bin/activate; \
> pip install openllm; \
> openllm start facebook/opt-1.3b

Error

It is recommended to specify the backend explicitly. Cascading backend might lead to unexpected behaviour.
Traceback (most recent call last):
....
    llm = openllm.LLM[t.Any, t.Any](
  File "/usr/lib/python3.10/typing.py", line 957, in __call__
    result = self.__origin__(*args, **kwargs)
  File "/home/sauron/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm/_llm.py", line 205, in __init__
    quantise=getattr(self._Quantise, backend)(self, quantize),
TypeError: getattr(): attribute name must be string

Fix

pip install openllm[vllm]
openllm start facebook/opt-1.3b --backend vllm

LangChain integration

Steps

pip install langchain; \
cat <<EOF > openllm-langchain.py
from langchain.llms import OpenLLM

llm = OpenLLM(model_name='llama', model_id='meta-llama/Llama-2-7b-hf')

llm('What is the difference between a duck and a goose? And why there are so many Goose in Canada?')
EOF
python openllm-langchain.py

Error

NOT RECOMMENDED in production and SHOULD ONLY used for development.
...
Traceback (most recent call last):
...
    res = self._runner(prompt, **config.model_dump(flatten=True))
TypeError: 'LlamaRunner' object is not callable

Fix

Unknown

Logs

No response

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.10 python: 3.10.13 platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.31 uid_gid: 1000:1000

pip_packages
``` accelerate==0.25.0 aiohttp==3.9.1 aioprometheus==23.3.0 aiosignal==1.3.1 anyio==3.7.1 appdirs==1.4.4 asgiref==3.7.2 async-timeout==4.0.3 attrs==23.1.0 bentoml==1.1.10 bitsandbytes==0.41.3.post2 build==0.10.0 cattrs==23.1.2 certifi==2023.11.17 charset-normalizer==3.3.2 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 coloredlogs==15.0.1 contextlib2==21.6.0 cuda-python==12.3.0 dataclasses-json==0.6.3 datasets==2.15.0 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.7 distlib==0.3.8 distro==1.8.0 einops==0.7.0 exceptiongroup==1.2.0 fastapi==0.105.0 fastcore==1.5.29 filelock==3.13.1 filetype==1.2.0 frozenlist==1.4.1 fs==2.4.16 fsspec==2023.10.0 ghapi==1.0.4 greenlet==3.0.2 grpcio==1.60.0 h11==0.14.0 httpcore==1.0.2 httptools==0.6.1 httpx==0.25.2 huggingface-hub==0.19.4 humanfriendly==10.0 idna==3.6 importlib-metadata==6.11.0 inflection==0.5.1 Jinja2==3.1.2 jsonpatch==1.33 jsonpointer==2.4 jsonschema==4.20.0 jsonschema-specifications==2023.11.2 langchain==0.0.350 langchain-community==0.0.3 langchain-core==0.1.1 langsmith==0.0.71 markdown-it-py==3.0.0 MarkupSafe==2.1.3 marshmallow==3.20.1 mdurl==0.1.2 mpmath==1.3.0 msgpack==1.0.7 multidict==6.0.4 multiprocess==0.70.15 mypy-extensions==1.0.0 networkx==3.2.1 ninja==1.11.1.1 numpy==1.26.2 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-ml-py==11.525.150 nvidia-nccl-cu12==2.18.1 nvidia-nvjitlink-cu12==12.3.101 nvidia-nvtx-cu12==12.1.105 openllm==0.4.40 openllm-client==0.4.40 openllm-core==0.4.40 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.16.1 orjson==3.9.10 packaging==23.2 pandas==2.1.4 pathspec==0.12.1 Pillow==10.1.0 pip-requirements-parser==32.0.1 pip-tools==7.3.0 platformdirs==4.1.0 prometheus-client==0.19.0 protobuf==4.25.1 psutil==5.9.6 pyarrow==14.0.1 pyarrow-hotfix==0.6 pydantic==1.10.13 Pygments==2.17.2 pyparsing==3.1.1 pyproject_hooks==1.0.0 python-dateutil==2.8.2 python-dotenv==1.0.0 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.2 quantile-python==1.1 ray==2.6.0 referencing==0.32.0 regex==2023.10.3 requests==2.31.0 rich==13.7.0 rpds-py==0.13.2 safetensors==0.4.1 schema==0.7.5 scipy==1.11.4 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 SQLAlchemy==2.0.23 starlette==0.27.0 sympy==1.12 tenacity==8.2.3 tokenizers==0.15.0 tomli==2.0.1 torch==2.1.2 tornado==6.4 tqdm==4.66.1 transformers==4.36.1 triton==2.1.0 typing-inspect==0.9.0 typing_extensions==4.9.0 tzdata==2023.3 urllib3==2.1.0 uvicorn==0.24.0.post1 uvloop==0.19.0 virtualenv==20.25.0 vllm==0.2.5 watchfiles==0.21.0 websockets==12.0 wrapt==1.16.0 xformers==0.0.23.post1 xxhash==3.4.1 yarl==1.9.4 zipp==3.17.0 ```

System information (Optional)

No response

aarnphm commented 11 months ago

brand new virtualenv 3.10

Screenshot 2023-12-16 at 13 06 31

The langchain I have a upstream PR for it.

jmformenti commented 11 months ago

Could you please share how you created the virtual environment? And the url of the PR in langchain to be able to track? Thanks

aarnphm commented 11 months ago

https://github.com/langchain-ai/langchain/pull/12968

venv () {
    name="${1:-venv}" 
    if [[ ! -d "$name" ]]
    then
        pip freeze | grep "virtualenv" &> /dev/null || pip install virtualenv
        python -m virtualenv "$name" --download
        source "$name/bin/activate"
        pip install "protobuf<3.20"
    else
        source "$name/bin/activate"
    fi
}
jmformenti commented 11 months ago

Thanks!

Regarding the installation is not working neither, this is the full stack:

It is recommended to specify the backend explicitly. Cascading backend might lead to unexpected behaviour.
Traceback (most recent call last):
  File "/home/user/projects/sandbox/test/.venv/bin/openllm", line 8, in <module>
    sys.exit(cli())
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 160, in wrapper
    return_value = func(*args, **attrs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 141, in wrapper
    return f(*args, **attrs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm_cli/entrypoint.py", line 366, in start_command
    llm = openllm.LLM[t.Any, t.Any](
  File "/usr/lib/python3.10/typing.py", line 957, in __call__
    result = self.__origin__(*args, **kwargs)
  File "/home/user/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm/_llm.py", line 205, in __init__
    quantise=getattr(self._Quantise, backend)(self, quantize),
TypeError: getattr(): attribute name must be string

Anyway, I can fix it with:

pip install openllm[vllm]

Apparently this is a problem on my side, so I close the issue.

kjain25 commented 10 months ago

Describe the bug

I'm trying some examples but they don't work for me, not sure if it's due to configuration issue on my side or README degradation, eg start llm server or check langchain integration.

To reproduce

Start LLM Server

Steps

python3.10 -m venv .venv; \
> source .venv/bin/activate; \
> pip install openllm; \
> openllm start facebook/opt-1.3b

Error

It is recommended to specify the backend explicitly. Cascading backend might lead to unexpected behaviour.
Traceback (most recent call last):
....
    llm = openllm.LLM[t.Any, t.Any](
  File "/usr/lib/python3.10/typing.py", line 957, in __call__
    result = self.__origin__(*args, **kwargs)
  File "/home/sauron/projects/sandbox/test/.venv/lib/python3.10/site-packages/openllm/_llm.py", line 205, in __init__
    quantise=getattr(self._Quantise, backend)(self, quantize),
TypeError: getattr(): attribute name must be string

Fix

pip install openllm[vllm]
openllm start facebook/opt-1.3b --backend vllm

LangChain integration

Steps

pip install langchain; \
cat <<EOF > openllm-langchain.py
from langchain.llms import OpenLLM

llm = OpenLLM(model_name='llama', model_id='meta-llama/Llama-2-7b-hf')

llm('What is the difference between a duck and a goose? And why there are so many Goose in Canada?')
EOF
python openllm-langchain.py

Error

NOT RECOMMENDED in production and SHOULD ONLY used for development.
...
Traceback (most recent call last):
...
    res = self._runner(prompt, **config.model_dump(flatten=True))
TypeError: 'LlamaRunner' object is not callable

Fix

Unknown

Logs

No response

Environment

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.1.10 python: 3.10.13 platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.31 uid_gid: 1000:1000

pip_packages

  • transformers version: 4.36.1
  • Platform: Linux-5.4.0-167-generic-x86_64-with-glibc2.31
  • Python version: 3.10.13
  • Huggingface_hub version: 0.19.4
  • Safetensors version: 0.4.1
  • Accelerate version: 0.25.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.2+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

System information (Optional)

No response

Do you need CUDA to do pip install openllm[vllm]? I am getting the following error " Downloading vllm-0.2.6.tar.gz (167 kB) ---------------------------------------- 167.2/167.2 kB 2.5 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [23 lines of output] C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\torch\nn\modules\transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:84.) device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'), Traceback (most recent call last): File "C:\Users\13318\anaconda3\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 353, in main() File "C:\Users\13318\anaconda3\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "C:\Users\13318\anaconda3\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 118, in get_requires_for_build_wheel return hook(config_settings) File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\setuptools\build_meta.py", line 325, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=['wheel']) File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\setuptools\build_meta.py", line 295, in _get_build_requires self.run_setup() File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\setuptools\build_meta.py", line 311, in run_setup exec(code, locals()) File "", line 230, in File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\torch\utils\cpp_extension.py", line 1076, in CUDAExtension library_dirs += library_paths(cuda=True) File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\torch\utils\cpp_extension.py", line 1210, in library_paths paths.append(_join_cuda_home(lib_dir)) File "C:\Users\13318\AppData\Local\Temp\pip-build-env-v6ffr_4x\overlay\Lib\site-packages\torch\utils\cpp_extension.py", line 2416, in _join_cuda_home raise OSError('CUDA_HOME environment variable is not set. ' OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output."