FreedomIntelligence / GrammarGPT

The code and data for GrammarGPT.
Apache License 2.0
165 stars 9 forks source link

finetune.py中修改use_lora,训练时报错Expected a cuda device, but got: cpu #11

Open lizhao-8202 opened 6 months ago

lizhao-8202 commented 6 months ago

我试着把finetune.py中user_lora设置为true。运行trainer.train()时抛出Expected a cuda device, but got: cpu。详细日志 Traceback (most recent call last): File "/data/AI/GrammarGPT/GrammarGPT-main/finetune.py", line 319, in fire.Fire(train) File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/opt/python3.10/python3/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, *kwargs) File "/data/AI/GrammarGPT/GrammarGPT-main/finetune.py", line 308, in train trainer.train() File "/opt/python3.10/python3/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train return inner_training_loop( File "/opt/python3.10/python3/lib/python3.10/site-packages/transformers/trainer.py", line 2049, in _inner_training_loop self._load_best_model() File "/opt/python3.10/python3/lib/python3.10/site-packages/transformers/trainer.py", line 2225, in _load_best_model load_result = model.load_state_dict(state_dict, False) File "/opt/python3.10/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2175, in load_state_dict load(self, state_dict) File "/opt/python3.10/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2163, in load load(child, child_state_dict, child_prefix) # noqa: F821 File "/opt/python3.10/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2163, in load load(child, child_state_dict, child_prefix) # noqa: F821 [Previous line repeated 5 more times] File "/opt/python3.10/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2157, in load module._load_from_state_dict( File "/opt/python3.10/python3/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 416, in _load_from_state_dict super()._load_from_state_dict(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, File "/opt/python3.10/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2012, in _load_from_state_dict hook(state_dict, prefix, local_metadata, strict, missing_keys, unexpected_keys, error_msgs) File "/opt/python3.10/python3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 73, in call return self.hook(args, **kwargs) File "/opt/python3.10/python3/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 366, in maybe_rearrange_weight tile_indices = get_tile_inds(weight_format, weight.device) File "/opt/python3.10/python3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 247, in get_tile_inds return get_inverse_transform_indices(transform, _get_tile_size(format)).to(device) File "/opt/python3.10/python3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 79, in get_inverse_transform_indices permuted_tile_i = transform_tile(sample_tile_i) File "/opt/python3.10/python3/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 245, in transform = lambda x: F.transform(x.to(device), from_order="row", to_order=format)[0].to(x.device) File "/opt/python3.10/python3/lib/python3.10/site-packages/bitsandbytes/functional.py", line 2196, in transform prev_device = pre_call(A.device) File "/opt/python3.10/python3/lib/python3.10/site-packages/bitsandbytes/functional.py", line 417, in pre_call torch.cuda.set_device(device) File "/opt/python3.10/python3/lib/python3.10/site-packages/torch/cuda/init.py", line 397, in set_device device = _get_device_index(device) File "/opt/python3.10/python3/lib/python3.10/site-packages/torch/cuda/_utils.py", line 34, in _get_device_index raise ValueError(f"Expected a cuda device, but got: {device}") ValueError: Expected a cuda device, but got: cpu

这里如果使用lora的逻辑,是不是有其他参数需要做调整比如resume_from_checkpoint.或是不是通过在train时添加其他参数能避免这报错 另你们当初试验机器的环境配置是啥样的,显存,机器的cpu及内存等。

lizhao-8202 commented 6 months ago

现在是想做中文语法检查,但有些特殊场景需要做训练。目前我们的显卡现存是24G

fjiangAI commented 6 months ago

现在是想做中文语法检查,但有些特殊场景需要做训练。目前我们的显卡现存是24G

当时我们使用的是4卡80G,A100的显卡进行的全量微调。 关于LORA我们当时并未进行测试,看起来像是模型没有放到GPU上导致的。建议trainer前确认模型等其他模块的device位置再进行修改。 也欢迎提pull requests,如果你解决了这个问题的话。

lizhao-8202 commented 6 months ago

现在是想做中文语法检查,但有些特殊场景需要做训练。目前我们的显卡现存是24G

当时我们使用的是4卡80G,A100的显卡进行的全量微调。 关于LORA我们当时并未进行测试,看起来像是模型没有放到GPU上导致的。建议trainer前确认模型等其他模块的device位置再进行修改。 也欢迎提pull requests,如果你解决了这个问题的话。

LORA这块找了好久没找到原因,主要是也不熟悉transformers框架。报错时相关的数据是这样的 weight.device为:device(type='cpu')

weight.format为: col_turing

weight为:tensor([[ -3, -6, -30, ..., 7, 18, 25], [-32, 0, 0, ..., -18, -42, -37], [ 76, 56, -68, ..., 30, 59, 9], ..., [ 18, 10, -3, ..., 9, -15, -12], [ 12, -31, 24, ..., 0, 24, 3], [ -4, 37, 25, ..., 10, -3, -20]], dtype=torch.int8)

报错对应的前后代码块为(报错方法为以下代码倒数第二行的 get_tile_inds(weight_format, weight.device))

    weight = state_dict.get(f"{prefix}weight")
    if weight is None:
        # if the state dict has no weights for this layer (e.g., LoRA finetuning), do nothing
        return
    weight_format = state_dict.pop(f"{prefix}weight_format", "row")

    if weight_format != "row":
        tile_indices = get_tile_inds(weight_format, weight.device)
        state_dict[f"{prefix}weight"] = undo_layout(weight, tile_indices)
lizhao-8202 commented 6 months ago

现在是想做中文语法检查,但有些特殊场景需要做训练。目前我们的显卡现存是24G

当时我们使用的是4卡80G,A100的显卡进行的全量微调。 关于LORA我们当时并未进行测试,看起来像是模型没有放到GPU上导致的。建议trainer前确认模型等其他模块的device位置再进行修改。 也欢迎提pull requests,如果你解决了这个问题的话。

LORA这块找了好久没找到原因,主要是也不熟悉transformers框架。报错时相关的数据是这样的 weight.device为:device(type='cpu')

weight.format为: col_turing

weight为:tensor([[ -3, -6, -30, ..., 7, 18, 25], [-32, 0, 0, ..., -18, -42, -37], [ 76, 56, -68, ..., 30, 59, 9], ..., [ 18, 10, -3, ..., 9, -15, -12], [ 12, -31, 24, ..., 0, 24, 3], [ -4, 37, 25, ..., 10, -3, -20]], dtype=torch.int8)

报错对应的前后代码块为(报错方法为以下代码倒数第二行的 get_tile_inds(weight_format, weight.device))

    weight = state_dict.get(f"{prefix}weight")
    if weight is None:
        # if the state dict has no weights for this layer (e.g., LoRA finetuning), do nothing
        return
    weight_format = state_dict.pop(f"{prefix}weight_format", "row")

    if weight_format != "row":
        tile_indices = get_tile_inds(weight_format, weight.device)
        state_dict[f"{prefix}weight"] = undo_layout(weight, tile_indices)

prefix为:base_model.model.transformer.h.0.self_attention.query_key_value.base_layer.

lizhao-8202 commented 6 months ago

现在是想做中文语法检查,但有些特殊场景需要做训练。目前我们的显卡现存是24G

当时我们使用的是4卡80G,A100的显卡进行的全量微调。 关于LORA我们当时并未进行测试,看起来像是模型没有放到GPU上导致的。建议trainer前确认模型等其他模块的device位置再进行修改。 也欢迎提pull requests,如果你解决了这个问题的话。

LORA这块找了好久没找到原因,主要是也不熟悉transformers框架。报错时相关的数据是这样的 weight.device为:device(type='cpu') weight.format为: col_turing weight为:tensor([[ -3, -6, -30, ..., 7, 18, 25], [-32, 0, 0, ..., -18, -42, -37], [ 76, 56, -68, ..., 30, 59, 9], ..., [ 18, 10, -3, ..., 9, -15, -12], [ 12, -31, 24, ..., 0, 24, 3], [ -4, 37, 25, ..., 10, -3, -20]], dtype=torch.int8) 报错对应的前后代码块为(报错方法为以下代码倒数第二行的 get_tile_inds(weight_format, weight.device))

    weight = state_dict.get(f"{prefix}weight")
    if weight is None:
        # if the state dict has no weights for this layer (e.g., LoRA finetuning), do nothing
        return
    weight_format = state_dict.pop(f"{prefix}weight_format", "row")

    if weight_format != "row":
        tile_indices = get_tile_inds(weight_format, weight.device)
        state_dict[f"{prefix}weight"] = undo_layout(weight, tile_indices)

prefix为:base_model.model.transformer.h.0.self_attention.query_key_value.base_layer.

或者我们有没其他模型可以用我们这代码微调成功(模型小一点,能对中文语意进行内容调整。或者phoenix-inst-chat-7b训练过程中的某个版本是否提供)

fjiangAI commented 6 months ago

现在是想做中文语法检查,但有些特殊场景需要做训练。目前我们的显卡现存是24G

当时我们使用的是4卡80G,A100的显卡进行的全量微调。 关于LORA我们当时并未进行测试,看起来像是模型没有放到GPU上导致的。建议trainer前确认模型等其他模块的device位置再进行修改。 也欢迎提pull requests,如果你解决了这个问题的话。

LORA这块找了好久没找到原因,主要是也不熟悉transformers框架。报错时相关的数据是这样的 weight.device为:device(type='cpu') weight.format为: col_turing weight为:tensor([[ -3, -6, -30, ..., 7, 18, 25], [-32, 0, 0, ..., -18, -42, -37], [ 76, 56, -68, ..., 30, 59, 9], ..., [ 18, 10, -3, ..., 9, -15, -12], [ 12, -31, 24, ..., 0, 24, 3], [ -4, 37, 25, ..., 10, -3, -20]], dtype=torch.int8) 报错对应的前后代码块为(报错方法为以下代码倒数第二行的 get_tile_inds(weight_format, weight.device))

    weight = state_dict.get(f"{prefix}weight")
    if weight is None:
        # if the state dict has no weights for this layer (e.g., LoRA finetuning), do nothing
        return
    weight_format = state_dict.pop(f"{prefix}weight_format", "row")

    if weight_format != "row":
        tile_indices = get_tile_inds(weight_format, weight.device)
        state_dict[f"{prefix}weight"] = undo_layout(weight, tile_indices)

prefix为:base_model.model.transformer.h.0.self_attention.query_key_value.base_layer.

或者我们有没其他模型可以用我们这代码微调成功(模型小一点,能对中文语意进行内容调整。或者phoenix-inst-chat-7b训练过程中的某个版本是否提供)

phoenix是基于bloomz进行微调的,Bloomz的小版本模型(560m,1b,3b)都可以使用。另外,整个代码也并不困难,只是特定数据准备。如果数据准备完成后,可以使用其他框架例如(LLaMAFactory)进行微调。

lizhao-8202 commented 6 months ago

现在是想做中文语法检查,但有些特殊场景需要做训练。目前我们的显卡现存是24G

当时我们使用的是4卡80G,A100的显卡进行的全量微调。 关于LORA我们当时并未进行测试,看起来像是模型没有放到GPU上导致的。建议trainer前确认模型等其他模块的device位置再进行修改。 也欢迎提pull requests,如果你解决了这个问题的话。

LORA这块找了好久没找到原因,主要是也不熟悉transformers框架。报错时相关的数据是这样的 weight.device为:device(type='cpu') weight.format为: col_turing weight为:tensor([[ -3, -6, -30, ..., 7, 18, 25], [-32, 0, 0, ..., -18, -42, -37], [ 76, 56, -68, ..., 30, 59, 9], ..., [ 18, 10, -3, ..., 9, -15, -12], [ 12, -31, 24, ..., 0, 24, 3], [ -4, 37, 25, ..., 10, -3, -20]], dtype=torch.int8) 报错对应的前后代码块为(报错方法为以下代码倒数第二行的 get_tile_inds(weight_format, weight.device))

    weight = state_dict.get(f"{prefix}weight")
    if weight is None:
        # if the state dict has no weights for this layer (e.g., LoRA finetuning), do nothing
        return
    weight_format = state_dict.pop(f"{prefix}weight_format", "row")

    if weight_format != "row":
        tile_indices = get_tile_inds(weight_format, weight.device)
        state_dict[f"{prefix}weight"] = undo_layout(weight, tile_indices)

prefix为:base_model.model.transformer.h.0.self_attention.query_key_value.base_layer.

或者我们有没其他模型可以用我们这代码微调成功(模型小一点,能对中文语意进行内容调整。或者phoenix-inst-chat-7b训练过程中的某个版本是否提供)

phoenix是基于bloomz进行微调的,Bloomz的小版本模型(560m,1b,3b)都可以使用。另外,整个代码也并不困难,只是特定数据准备。如果数据准备完成后,可以使用其他框架例如(LLaMAFactory)进行微调。

bloomz本身不具备中文语法纠错的能力吧。如果要实现中文语法纠错,基于Bloomz + 特定数据集 用LLaMAFactory微调就可以实现吗。对AI这块知识比较匮乏,不好意思问的比较低级