OpenMOSS / CoLLiE

Collaborative Training of Large Language Models in an Efficient Way
https://openlmlab-collie.readthedocs.io
Apache License 2.0
410 stars 58 forks source link

AttributeError: 'PeftModelForCausalLM' object has no attribute 'set_cache' #140

Closed JiafeiSun closed 11 months ago

JiafeiSun commented 11 months ago

internlm 7b lora跑2流水命令如下,报错AttributeError: 'PeftModelForCausalLM' object has no attribute 'set_cache',请问是啥原因呢 CUDA_VISIBLE_DEVICES=0,1 torchrun --rdzv_backend=c10d --rdzv_endpoint=localhost:29402 --nnodes=1 --nproc_per_node=2 finetune_internlm_for_classification.py

[01:45:34] INFO     Pipeline initialization starts, the provided loss_fn is not currently being used; it will be utilized in trainer.                                                                 base.py:128
SEED_LAYERS=False BASE_SEED=42 SEED_FN=None
[2023-11-30 01:45:34,008] [INFO] [module.py:375:_partition_layers] Partitioning pipeline stages with method parameters
stage=0 layers=17
     0: _inner
     1: InternLMLayer
     2: InternLMLayer
     3: InternLMLayer
     4: InternLMLayer
     5: InternLMLayer
     6: InternLMLayer
     7: InternLMLayer
     8: InternLMLayer
     9: InternLMLayer
    10: InternLMLayer
    11: InternLMLayer
    12: InternLMLayer
    13: InternLMLayer
    14: InternLMLayer
    15: InternLMLayer
    16: InternLMLayer
stage=1 layers=18
    17: InternLMLayer
    18: InternLMLayer
    19: InternLMLayer
    20: InternLMLayer
    21: InternLMLayer
    22: InternLMLayer
    23: InternLMLayer
    24: InternLMLayer
    25: InternLMLayer
    26: InternLMLayer
    27: InternLMLayer
    28: InternLMLayer
    29: InternLMLayer
    30: InternLMLayer
    31: InternLMLayer
    32: InternLMLayer
    33: _inner
    34: _inner
  loss: GPTLMLoss
trainable params: 3,145,728 || all params: 3,664,117,760 || trainable%: 0.08585226256483634
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 434, in __getattr__
    return super().__getattr__(name)  # defer to nn.Module's logic
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1646, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'PeftModelForCausalLM' object has no attribute 'set_cache'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/peft/tuners/lora.py", line 492, in __getattr__
    return super().__getattr__(name)  # defer to nn.Module's logic
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1646, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LoraModel' object has no attribute 'set_cache'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "finetune_internlm_for_classification.py", line 187, in <module>
    main(args)
  File "finetune_internlm_for_classification.py", line 128, in main
    model.set_cache(False)
  File "/usr/local/lib/python3.8/dist-packages/peft/peft_model.py", line 436, in __getattr__
    return getattr(self.base_model, name)
  File "/usr/local/lib/python3.8/dist-packages/peft/tuners/lora.py", line 494, in __getattr__
    return getattr(self.model, name)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1646, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'PipelineModel' object has no attribute 'set_cache'
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 839 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 838) of binary: /usr/bin/python
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch==2.1.0a0+fe05266', 'console_scripts', 'torchrun')())
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.8/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
KaiLv69 commented 11 months ago

你好,使用pp时,实际上会使用PipelineModel这个类,里面没有set_cache这个方法。 使用pp时可以考虑把set_cache这一行删掉,对训练没有影响 :)

JiafeiSun commented 11 months ago

你好,使用pp时,实际上会使用PipelineModel这个类,里面没有set_cache这个方法。 使用pp时可以考虑把set_cache这一行删掉,对训练没有影响 :)

好的,谢谢