报错信息如下： root@172:/home/TeleChat-52B-main# python infer.py

FLASH ATTENTION 2 DETECTED

TELECHAT flash attention disabled Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [03:35<00:00, 19.55s/it] Some weights of TELECHAT were not initialized from the model checkpoint at /home/TeleChat-52B and are newly initialized: ['transformer.wpe.inv_freq'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:492: UserWarning: do_sample is set to False. However, temperature is set to 0.3 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:497: UserWarning: do_sample is set to False. However, top_p is set to 0.85 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:509: UserWarning: do_sample is set to False. However, top_k is set to 5 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:492: UserWarning: do_sample is set to False. However, temperature is set to 0.3 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:497: UserWarning: do_sample is set to False. However, top_p is set to 0.85 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:509: UserWarning: do_sample is set to False. However, top_k is set to 5 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k. warnings.warn( input: <_user>hello<_bot> Traceback (most recent call last): File "/home/TeleChat-52B-main/infer.py", line 21, in answer = model.chat(tokenizer,question, history_input_list = [], history_output_list = [],generation_config = generate_config) File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 1091, in chat output = self.generate(input_ids,generation_config) File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1527, in generate result = self._greedy_search( File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search outputs = self( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward output = module._old_forward(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 1022, in forward transformer_outputs = self.transformer( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 870, in forward outputs = block( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward output = module._old_forward(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 555, in forward attn_outputs = self.attn( File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward output = module._old_forward(args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 485, in forward attn_output, attn_weights = self._upcast_and_reordered_attn(query, key, value, attention_mask, head_mask) File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 382, in _upcast_and_reordered_attn if not self.is_cross_attention: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'TELECHATAttention' object has no attribute 'is_cross_attention'

Tele-AI / TeleChat-52B

运行目录下自带的refer.py, 报错：AttributeError: 'TELECHATAttention' object has no attribute 'is_cross_attention' #2

FLASH ATTENTION 2 DETECTED