modeling_chatglm.py:227: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
context_layer = torch.nn.functional.scaled_dot_product_attention(query_layer, key_layer, value_layer,
用demo推理卡死,不输出任何内容。
并且提示警告:
请问下和这个警告有关系吗?
版本:transformers==4.30.2, protobuf==4.24.4, cpm-kernels==1.0.11, torch== 2.3.1+cu121