Open AI-Mart opened 3 days ago
这还挺奇怪,我没遇到过这个问题。请问您的环境中用安装apex和flasn attn吗。如果有安装apex,建议卸载了;如果没安装flash attn可以安装一个试试看,因为lmdeploy中运行vlm的ViT部分应该还是跑的pytorch后端。
accelerate 0.34.2 addict 2.4.0 aiofiles 23.2.1 aiohappyeyeballs 2.4.0 aiohttp 3.10.5 aiosignal 1.3.1 altair 5.4.1 annotated-types 0.7.0 anyio 4.4.0 archspec 0.2.1 asttokens 2.0.5 astunparse 1.6.3 async-timeout 4.0.3 attrs 23.1.0 backcall 0.2.0 beautifulsoup4 4.12.2 bitsandbytes 0.41.0 blinker 1.8.2 boltons 23.0.0 Brotli 1.0.9 cachetools 5.5.0 certifi 2023.11.17 cffi 1.16.0 chardet 4.0.0 charset-normalizer 2.0.4 click 8.1.7 conda 23.9.0 conda-build 3.28.1 conda-content-trust 0.2.0 conda_index 0.3.0 conda-libmamba-solver 23.7.0 conda-package-handling 2.2.0 conda_package_streaming 0.9.0 contourpy 1.3.0 cryptography 41.0.7 cycler 0.12.1 decorator 5.1.1 decord 0.6.0 deepspeed 0.13.5 distro 1.8.0 dnspython 2.4.2 einops 0.6.1 einops-exts 0.0.4 exceptiongroup 1.0.4 executing 0.8.3 expecttest 0.1.6 fastapi 0.114.1 ffmpy 0.4.0 filelock 3.13.1 fire 0.6.0 flash-attn 2.3.6 fonttools 4.53.1 frozenlist 1.4.1 fsspec 2023.12.2 gitdb 4.0.11 GitPython 3.1.43 gmpy2 2.1.2 gradio 4.42.0 gradio_client 1.3.0 h11 0.14.0 hjson 3.1.0 httpcore 1.0.5 httpx 0.27.2 huggingface-hub 0.24.6 hypothesis 6.92.0 idna 3.4 imageio 2.35.1 importlib_metadata 8.5.0 importlib_resources 6.4.5 ipython 8.15.0 jedi 0.18.1 Jinja2 3.1.2 jiter 0.5.0 joblib 1.4.2 jsonpatch 1.32 jsonpointer 2.1 jsonschema 4.19.2 jsonschema-specifications 2023.7.1 kiwisolver 1.4.7 latex2mathml 3.77.0 libarchive-c 2.9 libmambapy 1.5.3 linkify-it-py 2.0.3 lmdeploy 0.5.3 markdown-it-py 2.2.0 markdown2 2.5.0 MarkupSafe 2.1.1 matplotlib 3.9.2 matplotlib-inline 0.1.6 mdit-py-plugins 0.3.3 mdurl 0.1.2 menuinst 2.0.1 mkl-fft 1.3.8 mkl-random 1.2.4 mkl-service 2.4.0 mmengine-lite 0.10.4 more-itertools 10.1.0 mpmath 1.3.0 multidict 6.1.0 narwhals 1.7.0 networkx 3.1 ninja 1.11.1.1 numpy 1.26.2 nvidia-cublas-cu12 12.6.1.4 nvidia-cuda-runtime-cu12 12.6.68 nvidia-curand-cu12 10.3.7.68 nvidia-nccl-cu12 2.23.4 openai 1.42.0 opencv-python-headless 4.10.0.84 orjson 3.10.7 packaging 23.1 pandas 2.2.2 parso 0.8.3 peft 0.11.1 pexpect 4.8.0 pickleshare 0.7.5 Pillow 10.0.1 pip 23.3.1 pkginfo 1.9.6 platformdirs 3.10.0 pluggy 1.0.0 prompt-toolkit 3.0.36 protobuf 5.28.1 psutil 5.9.0 ptyprocess 0.7.0 pure-eval 0.2.2 py-cpuinfo 9.0.0 pyarrow 17.0.0 pycocoevalcap 1.2 pycocotools 2.0.8 pycosat 0.6.6 pycparser 2.21 pydantic 2.9.1 pydantic_core 2.23.3 pydeck 0.9.1 pydub 0.25.1 Pygments 2.15.1 pynvml 11.5.3 pyOpenSSL 23.2.0 pyparsing 3.1.4 PySocks 1.7.1 python-dateutil 2.9.0.post0 python-etcd 0.4.5 python-multipart 0.0.9 pytz 2023.3.post1 PyYAML 6.0.1 referencing 0.30.2 regex 2024.9.11 requests 2.31.0 rich 13.8.1 rpds-py 0.10.6 ruamel.yaml 0.17.21 ruamel.yaml.clib 0.2.6 ruff 0.6.4 safetensors 0.4.5 scikit-learn 1.5.2 scipy 1.14.1 semantic-version 2.10.0 sentencepiece 0.1.99 setuptools 68.2.2 shellingham 1.5.4 shortuuid 1.0.13 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 sortedcontainers 2.4.0 soupsieve 2.5 stack-data 0.2.0 starlette 0.38.5 streamlit 1.38.0 streamlit-image-select 0.6.0 svgwrite 1.4.3 sympy 1.12 tenacity 8.5.0 tensorboardX 2.6.2.2 termcolor 2.4.0 threadpoolctl 3.5.0 tiktoken 0.7.0 timm 0.9.12 tokenizers 0.15.1 toml 0.10.2 tomli 2.0.1 tomlkit 0.12.0 toolz 0.12.0 torch 2.1.2 torchaudio 2.1.2 torchelastic 0.2.2 torchvision 0.16.2 tornado 6.4.1 tqdm 4.65.0 traitlets 5.7.1 transformers 4.37.2 triton 2.1.0 truststore 0.8.0 typer 0.12.5 types-dataclasses 0.6.6 typing_extensions 4.12.2 tzdata 2024.1 uc-micro-py 1.0.3 urllib3 2.2.2 uvicorn 0.30.6 watchdog 4.0.2 wavedrom 2.0.3.post3 wcwidth 0.2.5 websockets 12.0 wheel 0.41.2 yacs 0.1.8 yapf 0.40.2 yarl 1.11.1 zipp 3.20.1 zstandard 0.19.0
安装了Name: flash-attn Version: 2.3.6 Summary: Flash Attention: Fast and Memory-Efficient Exact Attention Home-page: https://github.com/Dao-AILab/flash-attention Author: Tri Dao Author-email: trid@cs.stanford.edu License: Location: /opt/conda/lib/python3.10/site-packages Requires: einops, ninja, packaging, torch Required-by:
没有安装apex
Checklist
Describe the bug
以上是部署InternVL2-Llama3-76B-AWQ的结果,可以正常输出 以上是部署InternVL2-Llama3-76B的结果,输出异常,总token变成0了
Reproduction
InternVL2-Llama3-76B-AWQ服务运行脚本: lmdeploy serve api_server /usr/local/serving/models/OpenGVLab/InternVL2-Llama3-76B-AWQ --backend turbomind --server-port 23333 --model-format awq --tp 2 --cache-max-entry-count 0.1
InternVL2-Llama3-76B服务运行脚本 lmdeploy serve api_server /usr/local/serving/models/OpenGVLab/InternVL2-Llama3-76B --backend turbomind --server-port 23333 --tp 4 --cache-max-entry-count 0.05 --session-len 256000
Environment
Error traceback
No response