bentoml / OpenLLM

Run any open-source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
10.05k stars 636 forks source link

mpt is listed under supported models but not available in openllm build command #99

Closed bvandorf closed 1 year ago

bvandorf commented 1 year ago

Describe the bug

mpt is listed under supported models but not available in openllm build command, this is the error message.

openllm build mpt Usage: openllm build [OPTIONS] {flan-t5|dolly-v2|chatglm|starcoder|falcon|stablelm|opt} Try 'openllm build -h' for help.

Error: Invalid value for '{flan-t5|dolly-v2|chatglm|starcoder|falcon|stablelm|opt}': 'mpt' is not one of 'flan-t5', 'dolly-v2', 'chatglm', 'starcoder', 'falcon', 'stablelm', 'opt'.

To reproduce

pip install "openllm[mpt]" openllm build mpt

Logs

No response

Environment

bentoml env

Environment variable

BENTOML_DEBUG=''
BENTOML_QUIET=''
BENTOML_BUNDLE_LOCAL_BUILD=''
BENTOML_DO_NOT_TRACK=''
BENTOML_CONFIG=''
BENTOML_CONFIG_OPTIONS=''
BENTOML_PORT=''
BENTOML_HOST=''
BENTOML_API_WORKERS=''

System information

bentoml: 1.0.23 python: 3.11.0 platform: Windows-10-10.0.19045-SP0 is_window_admin: True

pip_packages
``` absl-py==1.4.0 accelerate==0.20.3 aiofiles==23.1.0 aiohttp==3.8.4 aiolimiter==1.1.0 aiosignal==1.3.1 altair==5.0.1 anyio==3.6.2 appdirs==1.4.4 asgiref==3.7.2 asttokens==2.2.1 astunparse==1.6.3 async-timeout==4.0.2 asyncio==3.4.3 attrs==23.1.0 azure-core==1.27.0 azure-cosmos==4.3.1 azureml==0.2.7 backcall==0.2.0 beautifulsoup4==4.12.2 bentoml==1.0.23 bitsandbytes==0.39.0 blinker==1.6.2 Brotli==1.0.9 bs4==0.0.1 build==0.10.0 cachetools==5.3.1 cattrs==23.1.2 certifi==2022.12.7 cffi==1.15.1 charset-normalizer==3.1.0 circus==0.18.0 click==8.1.3 click-option-group==0.5.6 cloudpickle==2.2.1 colorama==0.4.6 coloredlogs==15.0.1 comm==0.1.2 contextlib2==21.6.0 contourpy==1.0.7 cryptography==41.0.1 cycler==0.11.0 dataclasses-json==0.5.8 datasets==2.13.0 debugpy==1.6.6 decorator==5.1.1 deepmerge==1.1.0 Deprecated==1.2.14 dill==0.3.6 diskcache==5.6.1 duckduckgo-search==3.8.3 einops==0.6.1 executing==1.2.0 faiss-cpu==1.7.4 fastapi==0.95.1 ffmpy==0.3.0 filelock==3.12.2 filetype==1.2.0 Flask==2.3.2 Flask-SQLAlchemy==3.0.5 flatbuffers==23.5.26 flexgen==0.1.7 fonttools==4.39.4 frozenlist==1.3.3 fs==2.4.16 fsspec==2023.6.0 gast==0.4.0 google-auth==2.19.1 google-auth-oauthlib==1.0.0 google-pasta==0.2.0 gptcache==0.1.32 gradio==3.33.1 gradio_client==0.2.5 greenlet==2.0.2 grpcio==1.54.2 grpcio-health-checking==1.48.2 guidance==0.0.63 h11==0.14.0 h2==4.1.0 h5py==3.8.0 hpack==4.0.0 httpcore==0.17.2 httpx==0.24.1 huggingface-hub==0.15.1 humanfriendly==10.0 hyperframe==6.0.1 idna==3.4 importlib-metadata==6.0.1 inflection==0.5.1 ipykernel==6.21.3 ipython==8.11.0 itsdangerous==2.1.2 jaconv==0.3.4 jamo==0.4.1 jax==0.4.12 jedi==0.18.2 Jinja2==3.1.2 joblib==1.2.0 jsonschema==4.17.3 jupyter_client==8.0.3 jupyter_core==5.2.0 keras==2.12.0 kiwisolver==1.4.4 langchain==0.0.196 langchainplus-sdk==0.0.11 libclang==16.0.0 linkify-it-py==2.0.2 llama-cpp-python==0.1.62 lxml==4.9.2 Markdown==3.4.3 markdown-it-py==2.2.0 MarkupSafe==2.1.3 marshmallow==3.19.0 marshmallow-enum==1.5.1 matplotlib==3.7.1 matplotlib-inline==0.1.6 mdit-py-plugins==0.3.3 mdurl==0.1.2 ml-dtypes==0.2.0 mpmath==1.3.0 msal==1.22.0 msgpack==1.0.5 multidict==6.0.4 multiprocess==0.70.14 mypy-extensions==1.0.0 nest-asyncio==1.5.6 networkx==3.0 numexpr==2.8.4 numpy==1.23.5 oauthlib==3.2.2 openai==0.27.8 openapi-schema-pydantic==1.2.4 openllm==0.1.17 opentelemetry-api==1.17.0 opentelemetry-instrumentation==0.38b0 opentelemetry-instrumentation-aiohttp-client==0.38b0 opentelemetry-instrumentation-asgi==0.38b0 opentelemetry-instrumentation-grpc==0.38b0 opentelemetry-sdk==1.17.0 opentelemetry-semantic-conventions==0.38b0 opentelemetry-util-http==0.38b0 opt-einsum==3.3.0 optimum==1.9.0 orjson==3.9.1 packaging==23.0 pandas==2.0.1 parso==0.8.3 pathspec==0.11.1 peft @ git+https://github.com/huggingface/peft@03eb378eb914fbee709ff7c86ba5b1d033b89524 pesq==0.0.4 pickleshare==0.7.5 pika==1.3.2 Pillow==9.5.0 pip-requirements-parser==32.0.1 pip-tools==6.13.0 platformdirs==3.1.1 playwright==1.35.0 prometheus-client==0.17.0 prompt-toolkit==3.0.38 protobuf==3.20.3 psutil==5.9.4 PuLP==2.7.0 pure-eval==0.2.2 pyarrow==12.0.1 pyasn1==0.5.0 pyasn1-modules==0.3.0 pycparser==2.21 pydantic==1.10.7 pydocumentdb==2.3.5 pydub==0.25.1 pyee==9.0.4 Pygments==2.14.0 pygtrie==2.5.0 PyJWT==2.7.0 PyMySQL==1.1.0 pynvml==11.5.0 pyparsing==3.0.9 pyproject_hooks==1.0.0 pyre-extensions==0.0.29 pyreadline3==3.4.1 pyrsistent==0.19.3 python-dateutil==2.8.2 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3 pywin32==305 PyYAML==6.0 pyzmq==25.0.0 regex==2023.6.3 requests==2.29.0 requests-oauthlib==1.3.1 rich==13.4.2 rsa==4.9 safetensors==0.3.1 schema==0.7.5 scikit-learn==1.2.2 scipy==1.10.1 semantic-version==2.10.0 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 socksio==1.0.0 soupsieve==2.4.1 SQLAlchemy==2.0.16 stack-data==0.6.2 starlette==0.26.1 sympy==1.12 tabulate==0.9.0 tenacity==8.2.2 tensorboard==2.12.3 tensorboard-data-server==0.7.0 tensorflow==2.12.0 tensorflow-estimator==2.12.0 tensorflow-intel==2.12.0 tensorflow-io-gcs-filesystem==0.31.0 termcolor==2.3.0 threadpoolctl==3.1.0 tiktoken==0.4.0 tokenizers==0.13.3 toolz==0.12.0 torch==2.0.1+cu117 torchaudio==2.0.2+cu117 torchvision==0.15.2+cu117 tornado==6.2 tqdm==4.65.0 traitlets==5.9.0 transformers==4.30.2 typing-inspect==0.9.0 typing_extensions==4.5.0 tzdata==2023.3 uc-micro-py==1.0.2 urllib3==1.26.15 uvicorn==0.22.0 watchfiles==0.19.0 wcwidth==0.2.6 websockets==11.0.3 Werkzeug==2.3.6 wrapt==1.14.1 xformers==0.0.20 xxhash==3.2.0 yarl==1.9.2 zipp==3.15.0 ```

System information (Optional)

No response

aarnphm commented 1 year ago

Usage: openllm build [OPTIONS] {flan-t5|dolly-v2|chatglm|starcoder|falcon|stablelm|opt|mpt}

  Package a given models into a Bento.

  $ openllm build flan-t5 --model-id google/flan-t5-large

  > NOTE: To run a container built from this Bento with GPU support, make sure
  > to have https://github.com/NVIDIA/nvidia-container-toolkit install locally.

Options:
  --model-id TEXT                 Optional model_id name or path for (fine-tune) weight.
  -o, --output [json|pretty|porcelain]
                                  Showing output type.  [env var: OPENLLM_OUTPUT; default: pretty]
  --overwrite                     Overwrite existing Bento for given LLM if it already exists.
  --workers-per-resource FLOAT    Number of workers per resource assigned. See
                                  https://docs.bentoml.org/en/latest/guides/scheduling.html#resource-scheduling-
                                  strategy for more information. By default, this is set to 1.

                                  NOTE: The workers value passed into 'build' will determine how the LLM can be
                                  provisioned in Kubernetes as well as in standalone container. This will ensure it
                                  has the same effect with 'openllm start --workers ...'
  Optimisation options.: [mutually_exclusive]
    --quantize [int8|int4|gptq]   Set quantization mode for serving in deployment.

                                  GPTQ is currently working in progress and will be available soon.

                                  NOTE: Quantization is only available for PyTorch models.
    --bettertransformer           Apply FasterTransformer wrapper to serve model. This will applies during serving
                                  time.
  --enable-features FEATURE[,FEATURE]
                                  Enable additional features for building this LLM Bento. Available: mpt, fine-tune,
                                  chatglm, agents, flan-t5, playground, starcoder, openai, falcon
  --adapter-id [PATH | [remote/][adapter_name:]adapter_id][, ...]
                                  Optional adapters id to be included within the Bento. Note that if you are using
                                  relative path, '--build-ctx' must be passed.
  --build-ctx TEXT                Build context. This is required if --adapter-id uses relative path
  --model-version TEXT            Model version provided for this 'model-id' if it is a custom path.
  --dockerfile-template FILENAME  Optional custom dockerfile template to be used with this BentoLLM.
  Miscellaneous options: 
    -q, --quiet                   Suppress all output.
    --debug, --verbose            Print out debug logs.
    --do-not-track                Do not send usage info
  -h, --help                      Show this message and exit.
aarnphm commented 1 year ago

Sorry for the late reply, but any updates on this? Feel free to reopen if you still running into this issue.

I can build mpt with openllm build (tested on linux and mac)