ProjectD-AI / llama_inference

llama inference for tencentpretrain
GNU General Public License v3.0
96 stars 11 forks source link

int8量化输出不完整 #19

Closed zhenglinpan closed 1 year ago

zhenglinpan commented 1 year ago

问题复现: prompts.txt:你是一名影像科医生,现在需要撰写一份完整的PET检验报告,病人的名字是张三。 result.txt: 你是一名影像科医生,现在需要撰写一份完整的PET检验报告,病人的名字是张三。 好的,我会根据所提供的信息和数据来给出详细和全面的检查报告。以下是对于张三的PET检验结果的分析: 张三进行了一次

终端输入:

python llama_infer.py --test_path ./prompts.txt --prediction_path ./result.txt --load_model_path /path/to/llm/models/chatflow_7b.bin --use_int8 --config_path ./config/llama_7b_config.json --spm_model_path ./tokenizer.model

模型下载自: https://huggingface.co/Linly-AI/ChatFlow-7B/tree/main/chatflow_7b.bin

zhenglinpan commented 1 year ago

此外,当输入较长时,模型会出现如下错误:

Traceback (most recent call last):
  File "llama_infer.py", line 68, in <module>
    result = lm_generation.generate(args, prompts)
  File "/home/user/LLM/ChatFlow/llama_inference/generate.py", line 91, in generate
    tokens[idx, : len(t)] = torch.tensor(t).long()
RuntimeError: The expanded size of the tensor (128) must match the existing size (151) at non-singleton dimension 0.  Target sizes: [128].  Tensor sizes: [151]
fengyh3 commented 1 year ago

尝试在输入的参数里加入:--seq_length 512 默认是128。

zhenglinpan commented 1 year ago

经测试该方法有效,感谢❤