Closed jeffwang0516 closed 2 months ago
Hi there, thanks for creating the issue.
Do you have vllm available locally?
Hi
I'm still not able to run this model with vllm backend due to insufficient gpu mem (T4 16g seems not enough)
After some research, I think the root cause of this might be a single complete chinese character may be decoded from multiple token outputs. So decoding to text on every generate iteration is not feasible for Chinese.
Sounds like a orthogonal issue from OpenLLM?
For pytorch backend, it is related to OpenLLM in the implementation of PyTorchRunnable. It might need some way to detect incomplete character on each generation, probably something like what text-generation-inference server had here OR transformers TextStreamer done here
If vllm backend has this handled, then OpenLLM will be doing fine. But I'm not able to verify it at the moment.
Tried to fix the problem with the text-generation-inference server approach (Related issue: https://github.com/huggingface/text-generation-inference/issues/333) Please have a look, thanks!
For pytorch backend, it is related to OpenLLM in the implementation of PyTorchRunnable. It might need some way to detect incomplete character on each generation, probably something like what text-generation-inference server had here OR transformers TextStreamer done here
If vllm backend has this handled, then OpenLLM will be doing fine. But I'm not able to verify it at the moment.
FYI, found that vllm had also fix this issue with text-generation-inference approach in this pr https://github.com/vllm-project/vllm/pull/984
I will take a look into detokenization incrementally for PyTorch backend.
close for openllm 0.6
Describe the bug
I'm recently trying to use a fine-tuned version of llama2 that supports Traditional Chinese. https://huggingface.co/yentinglin/Taiwan-LLM-7B-v2.1-chat
The output text from CompletionChunk seems to be having some encoding issue I guess. If I directly use tokenizer.decode from generated token_ids, the output is fine.
To reproduce
Here's how to reproduce the issue:
Output:
Logs
No response
Environment
Environment variable
System information
bentoml
: 1.1.10python
: 3.8.10platform
: Linux-5.4.0-169-generic-x86_64-with-glibc2.29uid_gid
: 1000:1000pip_packages
``` accelerate==0.25.0 aiohttp==3.9.1 aiosignal==1.3.1 anyio==4.2.0 appdirs==1.4.4 asgiref==3.7.2 async-timeout==4.0.3 attrs==23.1.0 bentoml==1.1.10 bitsandbytes==0.41.3.post2 build==0.10.0 cattrs==23.1.2 certifi==2023.11.17 charset-normalizer==3.3.2 circus==0.18.0 click==8.1.7 click-option-group==0.5.6 cloudpickle==3.0.0 coloredlogs==15.0.1 contextlib2==21.6.0 cuda-python==12.3.0 datasets==2.15.0 deepmerge==1.1.1 Deprecated==1.2.14 dill==0.3.7 distlib==0.3.8 distro==1.8.0 einops==0.7.0 exceptiongroup==1.2.0 fastcore==1.5.29 filelock==3.9.0 frozenlist==1.4.1 fs==2.4.16 fsspec==2023.12.2 ghapi==1.0.4 h11==0.14.0 httpcore==1.0.2 httpx==0.26.0 huggingface-hub==0.20.1 humanfriendly==10.0 idna==3.6 importlib-metadata==6.11.0 inflection==0.5.1 Jinja2==3.1.2 markdown-it-py==3.0.0 MarkupSafe==2.1.3 mdurl==0.1.2 mpmath==1.3.0 multidict==6.0.4 multiprocess==0.70.15 mypy-extensions==1.0.0 networkx==3.0 numpy==1.24.4 nvidia-ml-py==11.525.150 openllm==0.4.41 openllm-client==0.4.41 openllm-core==0.4.41 opentelemetry-api==1.20.0 opentelemetry-instrumentation==0.41b0 opentelemetry-instrumentation-aiohttp-client==0.41b0 opentelemetry-instrumentation-asgi==0.41b0 opentelemetry-sdk==1.20.0 opentelemetry-semantic-conventions==0.41b0 opentelemetry-util-http==0.41b0 optimum==1.16.1 orjson==3.9.10 packaging==23.2 pandas==2.0.3 pathspec==0.12.1 pip-requirements-parser==32.0.1 pip-tools==7.3.0 platformdirs==4.1.0 prometheus-client==0.19.0 psutil==5.9.7 pyarrow==14.0.2 pyarrow-hotfix==0.6 pygments==2.17.2 pyparsing==3.1.1 pyproject-hooks==1.0.0 python-dateutil==2.8.2 python-json-logger==2.0.7 python-multipart==0.0.6 pytz==2023.3.post1 PyYAML==6.0.1 pyzmq==25.1.2 regex==2023.10.3 requests==2.31.0 rich==13.7.0 safetensors==0.4.1 schema==0.7.5 scipy==1.10.1 sentencepiece==0.1.99 simple-di==0.1.5 six==1.16.0 sniffio==1.3.0 starlette==0.34.0 sympy==1.12 tokenizers==0.15.0 tomli==2.0.1 torch==2.1.0+cu121 tornado==6.4 tqdm==4.66.1 transformers==4.36.2 triton==2.1.0 typing-extensions==4.4.0 tzdata==2023.3 urllib3==2.1.0 uvicorn==0.25.0 virtualenv==20.25.0 watchfiles==0.21.0 wrapt==1.16.0 xxhash==3.4.1 yarl==1.9.4 zipp==3.17.0 ```
System information (Optional)
transformers
version: 4.36.2