TELECHAT flash attention disabled
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [03:35<00:00, 19.55s/it]
Some weights of TELECHAT were not initialized from the model checkpoint at /home/TeleChat-52B and are newly initialized: ['transformer.wpe.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
/opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:492: UserWarning: do_sample is set to False. However, temperature is set to 0.3 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:497: UserWarning: do_sample is set to False. However, top_p is set to 0.85 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:509: UserWarning: do_sample is set to False. However, top_k is set to 5 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:492: UserWarning: do_sample is set to False. However, temperature is set to 0.3 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:497: UserWarning: do_sample is set to False. However, top_p is set to 0.85 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p.
warnings.warn(
/opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:509: UserWarning: do_sample is set to False. However, top_k is set to 5 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_k.
warnings.warn(
input: <_user>hello<_bot>
Traceback (most recent call last):
File "/home/TeleChat-52B-main/infer.py", line 21, in
answer = model.chat(tokenizer,question, history_input_list = [], history_output_list = [],generation_config = generate_config)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 1091, in chat
output = self.generate(input_ids,generation_config)
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1527, in generate
result = self._greedy_search(
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search
outputs = self(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 1022, in forward
transformer_outputs = self.transformer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 870, in forward
outputs = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 555, in forward
attn_outputs = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 485, in forward
attn_output, attn_weights = self._upcast_and_reordered_attn(query, key, value, attention_mask, head_mask)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 382, in _upcast_and_reordered_attn
if not self.is_cross_attention:
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'TELECHATAttention' object has no attribute 'is_cross_attention'
报错信息如下: root@172:/home/TeleChat-52B-main# python infer.py
FLASH ATTENTION 2 DETECTED
TELECHAT flash attention disabled Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [03:35<00:00, 19.55s/it] Some weights of TELECHAT were not initialized from the model checkpoint at /home/TeleChat-52B and are newly initialized: ['transformer.wpe.inv_freq'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:492: UserWarning:
answer = model.chat(tokenizer,question, history_input_list = [], history_output_list = [],generation_config = generate_config)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 1091, in chat
output = self.generate(input_ids,generation_config)
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1527, in generate
result = self._greedy_search(
File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2411, in _greedy_search
outputs = self(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 1022, in forward
transformer_outputs = self.transformer(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 870, in forward
outputs = block(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 555, in forward
attn_outputs = self.attn(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, *kwargs)
File "/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py", line 164, in new_forward
output = module._old_forward(args, **kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 485, in forward
attn_output, attn_weights = self._upcast_and_reordered_attn(query, key, value, attention_mask, head_mask)
File "/root/.cache/huggingface/modules/transformers_modules/TeleChat-52B/modeling_telechat.py", line 382, in _upcast_and_reordered_attn
if not self.is_cross_attention:
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1269, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'TELECHATAttention' object has no attribute 'is_cross_attention'
do_sample
is set toFalse
. However,temperature
is set to0.3
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettemperature
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:497: UserWarning:do_sample
is set toFalse
. However,top_p
is set to0.85
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_p
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:509: UserWarning:do_sample
is set toFalse
. However,top_k
is set to5
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_k
. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:492: UserWarning:do_sample
is set toFalse
. However,temperature
is set to0.3
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettemperature
. warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:497: UserWarning:do_sample
is set toFalse
. However,top_p
is set to0.85
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_p
. warnings.warn( /opt/conda/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:509: UserWarning:do_sample
is set toFalse
. However,top_k
is set to5
-- this flag is only used in sample-based generation modes. You should setdo_sample=True
or unsettop_k
. warnings.warn( input: <_user>hello<_bot> Traceback (most recent call last): File "/home/TeleChat-52B-main/infer.py", line 21, in