Orion-14B-Chat-Int4 chat error

Hi, when i run Orion-14B-Chat-Int4, by following code on A800-80G

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "OrionStarAI/Orion-14B-Chat-Int4"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16,
                                             trust_remote_code=True, use_flash_attention_2=True)

model.generation_config = GenerationConfig.from_pretrained(model_name)

import time

query = '世界第二高峰是哪个'
messages = [{"role": "user", "content": query}]
response = model.chat(tokenizer, messages, streaming=False)
print(response)

but met the ERROR that

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

and the following is the environment

transformers              4.36.2                   pypi_0    pypi
torch                     2.1.2+cu118              pypi_0    pypi
flash-attn                2.5.0                    pypi_0    pypi
accelerate                0.26.1                   pypi_0    pypi

Is there anything wrong with me ?

OrionStarAI / Orion

Orion-14B-Chat-Int4 chat error #29