在使用GPU推理BioMedGPT-LM-7B时卡住，无结果输出

使用CPU做推理时能够获得输出，但是使用GPU推理时在显卡利用率100%的情况下运行一个小时也没有输出结果。代码如下：

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "./model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map='cuda:0')

text = ["What's the function of Aspirin?"]
input = tokenizer(text,
              truncation=True,
              return_tensors="pt").to("cuda:0")

output = model.generate(inputs=input.input_ids, max_new_tokens=128, early_stopping=True)
print(tokenizer.decode(output[0]))

PharMolix / OpenBioMed

在使用GPU推理BioMedGPT-LM-7B时卡住，无结果输出 #65