finetuer.py调整参数

batch_size: int = 64,

#micro_batch_size: int = 8,
batch_size: int = 16,
micro_batch_size: int = 2,

下载了phoenix-inst-chat-7b模型作为初始模型

运行报错如下图（有weight为非数组，但不是全部） 378fb47d6d6a65efcd83bafc8c8683a File "/opt/python3.10/python3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 303, in shard_checkpoint weight_size = weight.numel() * dtype_byte_size(weight.dtype) AttributeError: 'str' object has no attribute 'numel'

想咨询下这应该从哪儿去排查。

作者有没该模型的使用交流群，有的话帮提供下。感谢感谢

试着打印了下出错前的state_dict。它的元素组成格式包含这两种格式 正常的格式类似如下： ('transformer.h.0.self_attention.query_key_value.bias', tensor([-0.1274, -0.2015, -0.0991, ..., 0.0200, -0.0038, 0.0011],device='cuda:0', dtype=torch.float16)) 会导致报错的格式类似如下： ('transformer.h.0.self_attention.dense.weight_format', 'col_turing')

为什么会有第二类的那种格式的数据呢，我这从根源上解决的话有什么建议吗？和我修改的参数有关吗？

试着打印了下出错前的state_dict。它的元素组成格式包含这两种格式 正常的格式类似如下： ('transformer.h.0.self_attention.query_key_value.bias', tensor([-0.1274, -0.2015, -0.0991, ..., 0.0200, -0.0038, 0.0011],device='cuda:0', dtype=torch.float16)) 会导致报错的格式类似如下： ('transformer.h.0.self_attention.dense.weight_format', 'col_turing')

为什么会有第二类的那种格式的数据呢，我这从根源上解决的话有什么建议吗？和我修改的参数有关吗？

目前没遇到过，建议检查一下模型的加载方面的代码。如果只是改了上述那些参数，应该不会影响程序的运行。

试着打印了下出错前的state_dict。它的元素组成格式包含这两种格式 正常的格式类似如下： ('transformer.h.0.self_attention.query_key_value.bias', tensor([-0.1274, -0.2015, -0.0991, ..., 0.0200, -0.0038, 0.0011],device='cuda:0', dtype=torch.float16)) 会导致报错的格式类似如下： ('transformer.h.0.self_attention.dense.weight_format', 'col_turing') 为什么会有第二类的那种格式的数据呢，我这从根源上解决的话有什么建议吗？和我修改的参数有关吗？

目前没遇到过，建议检查一下模型的加载方面的代码。如果只是改了上述那些参数，应该不会影响程序的运行。

在transformers的github上有看到类似问题。给出的方案是升级transformers和bitsandbytes 我按照给的方案，升级了transformers和bitsandbytes。再运行跑脚本，提示centos内核版本太低问题 按照提示，升级了centos内核（3.1.0--》5.5.0）. 现在跑的时候报错 Traceback (most recent call last): File "/data/AI/GrammarGPT/GrammarGPT-main/finetune.py", line 315, in fire.Fire(train) File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/data/AI/GrammarGPT/GrammarGPT-main/finetune.py", line 269, in train trainer = transformers.Trainer( File "/opt/python3.10/python3/lib/python3.10/site-packages/transformers/trainer.py", line 461, in init raise ValueError( ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details

这比之前的报错更提前了，之前是保存模型的时候报错。是不是方向错了。你们的python版本，liunx内核版本方便提供下吗？

试着打印了下出错前的state_dict。它的元素组成格式包含这两种格式 正常的格式类似如下： ('transformer.h.0.self_attention.query_key_value.bias', tensor([-0.1274, -0.2015, -0.0991, ..., 0.0200, -0.0038, 0.0011],device='cuda:0', dtype=torch.float16)) 会导致报错的格式类似如下： ('transformer.h.0.self_attention.dense.weight_format', 'col_turing') 为什么会有第二类的那种格式的数据呢，我这从根源上解决的话有什么建议吗？和我修改的参数有关吗？

目前没遇到过，建议检查一下模型的加载方面的代码。如果只是改了上述那些参数，应该不会影响程序的运行。

在transformers的github上有看到类似问题。给出的方案是升级transformers和bitsandbytes 我按照给的方案，升级了transformers和bitsandbytes。再运行跑脚本，提示centos内核版本太低问题 按照提示，升级了centos内核（3.1.0--》5.5.0）. 现在跑的时候报错 Traceback (most recent call last): File "/data/AI/GrammarGPT/GrammarGPT-main/finetune.py", line 315, in fire.Fire(train) File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, kwargs) File "/data/AI/GrammarGPT/GrammarGPT-main/finetune.py", line 269, in train trainer = transformers.Trainer( File "/opt/python3.10/python3/lib/python3.10/site-packages/transformers/trainer.py", line 461, in init** raise ValueError( ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details

这比之前的报错更提前了，之前是保存模型的时候报错。是不是方向错了。你们的python版本，liunx内核版本方便提供下吗？

感觉还是和加载模型的参数有关。试了下直接用AutoModelForCausalLM.from_pretrained("phoenix-inst-chat-7b" )，然后执行model.save_pretrained("***") 能成功保存模型。但我一旦改成AutoModelForCausalLM.from_pretrained("phoenix-inst-chat-7b" ，load_in_8bit=True,device_map="auto", ) 之前的AttributeError: 'str' object has no attribute 'numel'报错就会重现。因为load_in_8bit=False会报无法分配GPU的错。所以没法验证该问题是否因为我把load_in_8bit调整成true导致

试着打印了下出错前的state_dict。它的元素组成格式包含这两种格式 正常的格式类似如下： ('transformer.h.0.self_attention.query_key_value.bias', tensor([-0.1274, -0.2015, -0.0991, ..., 0.0200, -0.0038, 0.0011],device='cuda:0', dtype=torch.float16)) 会导致报错的格式类似如下： ('transformer.h.0.self_attention.dense.weight_format', 'col_turing') 为什么会有第二类的那种格式的数据呢，我这从根源上解决的话有什么建议吗？和我修改的参数有关吗？

目前没遇到过，建议检查一下模型的加载方面的代码。如果只是改了上述那些参数，应该不会影响程序的运行。

在transformers的github上有看到类似问题。给出的方案是升级transformers和bitsandbytes 我按照给的方案，升级了transformers和bitsandbytes。再运行跑脚本，提示centos内核版本太低问题 按照提示，升级了centos内核（3.1.0--》5.5.0）. 现在跑的时候报错 Traceback (most recent call last): File "/data/AI/GrammarGPT/GrammarGPT-main/finetune.py", line 315, in fire.Fire(train) File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, kwargs) File "/data/AI/GrammarGPT/GrammarGPT-main/finetune.py", line 269, in train trainer = transformers.Trainer( File "/opt/python3.10/python3/lib/python3.10/site-packages/transformers/trainer.py", line 461, in init raise ValueError( ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: https://huggingface.co/docs/transformers/peft for more details 这比之前的报错更提前了，之前是保存模型的时候报错。是不是方向错了。你们的python版本，liunx内核版本方便提供下吗？**

感觉还是和加载模型的参数有关。试了下直接用AutoModelForCausalLM.from_pretrained("phoenix-inst-chat-7b" )，然后执行model.save_pretrained("***") 能成功保存模型。但我一旦改成AutoModelForCausalLM.from_pretrained("phoenix-inst-chat-7b" ，load_in_8bit=True,device_map="auto", ) 之前的AttributeError: 'str' object has no attribute 'numel'报错就会重现。因为load_in_8bit=False会报无法分配GPU的错。所以没法验证该问题是否因为我把load_in_8bit调整成true导致

应该是这个问题，之前我们没有使用过load_in_8bit加载过。

FreedomIntelligence / GrammarGPT

微调报错：modeling_utils.py中报错weight_size = weight.numel() * dtype_byte_size(weight.dtype) AttributeError: 'str' object has no attribute 'numel' #10

batch_size: int = 64,