为什么感觉没有什么效果

shinerdeng commented 8 months ago

微信截图_20240119160908 Qwen-14B-Chat-Int4这个模型，3090*2，安装了flashattention，用的这个例子，提示词是“编一个200字左右的儿童故事” https://github.com/alipay/PainlessInferenceAcceleration/blob/main/pia/lookahead/examples/qwen_example.py

chenliangjyj commented 8 months ago

如果一张卡可以的话尽量放一张卡，auto是pipline切分我们传输的矩阵较大性能会有损耗。可以把device map改成 cuda:0，或者通过 CUDA_VISIBLE_DEVICES=0 来启动

shinerdeng commented 8 months ago

单卡情况。用和不用差不多。

chenliangjyj commented 8 months ago

如上是我这边的测试数据，机器是a800 transformers==4.32.0 使用的prompt是 “杭州在哪里？” 为了复现你的结果，麻烦提供下你的实验环境transformer torch auto-gptq optimum cuda等的相关版本因为原本没有适配qwen官方的int4量化模型，加速上可能有些精度问题，这个我们会进一步排查。

shinerdeng commented 8 months ago

杭州在哪里测过，太短了不好比较所以我加长了。 accelerate 0.26.1 aiohttp 3.9.1 aiosignal 1.3.1 annotated-types 0.6.0 anyio 4.2.0 async-timeout 4.0.3 attrs 23.2.0 auto-gptq 0.6.0 blinker 1.7.0 certifi 2023.7.22 charset-normalizer 3.3.2 click 8.1.7 coloredlogs 15.0.1 datasets 2.16.1 dill 0.3.7 dropout-layer-norm 0.1 einops 0.7.0 exceptiongroup 1.2.0 fastapi 0.109.0 filelock 3.13.1 flash-attn 2.4.2 Flask 3.0.0 frozenlist 1.4.1 fsspec 2023.10.0 gekko 1.0.6 h11 0.14.0 huggingface-hub 0.20.2 humanfriendly 10.0 idna 3.4 iniconfig 2.0.0 itsdangerous 2.1.2 Jinja2 3.1.2 MarkupSafe 2.1.3 mpmath 1.3.0 multidict 6.0.4 multiprocess 0.70.15 networkx 3.2.1 ninja 1.11.1.1 numpy 1.26.2 optimum 1.16.1 packaging 23.2 pandas 2.1.4 peft 0.7.1 pia 0.0.2 Pillow 10.1.0 pip 23.3.2 pluggy 1.3.0 protobuf 4.25.2 psutil 5.9.7 pyarrow 14.0.2 pyarrow-hotfix 0.6 pydantic 2.5.3 pydantic_core 2.14.6 pytest 7.4.4 python-dateutil 2.8.2 pytz 2023.3.post1 PyYAML 6.0.1 regex 2023.12.25 requests 2.31.0 rotary-emb 0.1 rouge 1.0.1 safetensors 0.4.1 scipy 1.11.4 sentencepiece 0.1.99 setuptools 65.5.0 six 1.16.0 sniffio 1.3.0 sse-starlette 1.8.2 starlette 0.35.1 sympy 1.12 tiktoken 0.5.2 tokenizers 0.13.3 tomli 2.0.1 torch 2.1.1+cu121 torchaudio 2.1.1+cu121 torchvision 0.16.1+cu121 tqdm 4.66.1 transformers 4.32.0 transformers-stream-generator 0.0.4 triton 2.1.0 typing_extensions 4.8.0 tzdata 2023.4 urllib3 2.1.0 uvicorn 0.25.0 Werkzeug 3.0.1 wheel 0.42.0 xxhash 3.4.1 yarl 1.9.4

chenliangjyj commented 8 months ago

shinerdeng commented 8 months ago

你可以用最新的脚本测试下 https://github.com/alipay/PainlessInferenceAcceleration/blob/main/pia/lookahead/examples/qwen_quant_example.py

有效！3Q！

alipay / PainlessInferenceAcceleration

为什么感觉没有什么效果 #6