int8量化输出不完整

zhenglinpan commented 1 year ago

问题复现： prompts.txt：你是一名影像科医生，现在需要撰写一份完整的PET检验报告，病人的名字是张三。 result.txt: 你是一名影像科医生，现在需要撰写一份完整的PET检验报告，病人的名字是张三。好的，我会根据所提供的信息和数据来给出详细和全面的检查报告。以下是对于张三的PET检验结果的分析：张三进行了一次

终端输入：

python llama_infer.py --test_path ./prompts.txt --prediction_path ./result.txt --load_model_path /path/to/llm/models/chatflow_7b.bin --use_int8 --config_path ./config/llama_7b_config.json --spm_model_path ./tokenizer.model

模型下载自: https://huggingface.co/Linly-AI/ChatFlow-7B/tree/main/chatflow_7b.bin

zhenglinpan commented 1 year ago

此外，当输入较长时，模型会出现如下错误：

Traceback (most recent call last):
  File "llama_infer.py", line 68, in <module>
    result = lm_generation.generate(args, prompts)
  File "/home/user/LLM/ChatFlow/llama_inference/generate.py", line 91, in generate
    tokens[idx, : len(t)] = torch.tensor(t).long()
RuntimeError: The expanded size of the tensor (128) must match the existing size (151) at non-singleton dimension 0.  Target sizes: [128].  Tensor sizes: [151]

fengyh3 commented 1 year ago

尝试在输入的参数里加入：--seq_length 512 默认是128。

zhenglinpan commented 1 year ago

经测试该方法有效，感谢❤

ProjectD-AI / llama_inference

int8量化输出不完整 #19