THUDM / CodeGeeX2

CodeGeeX2: A More Powerful Multilingual Code Generation Model
https://codegeex.cn
Apache License 2.0
7.64k stars 532 forks source link

codegeex2-6b-int4推理卡死,不输出任何内容 #269

Open LBJ6666 opened 4 months ago

LBJ6666 commented 4 months ago

用demo推理卡死,不输出任何内容。

prompt = "# language: Python\n# write a bubble sort function\n"
inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_length=256, top_k=1)
response = tokenizer.decode(outputs[0])
print(response)

并且提示警告:

modeling_chatglm.py:227: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
  context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer,

请问下和这个警告有关系吗?

版本:transformers==4.30.2, protobuf==4.24.4, cpm-kernels==1.0.11, torch== 2.3.1+cu121