OpenLMLab / LOMO

LOMO: LOw-Memory Optimization
MIT License
978 stars 68 forks source link

type object 'torch._C._distributed_c10d.ReduceOp' has no attribute 'AVG' #32

Closed season1blue closed 1 year ago

season1blue commented 1 year ago
Traceback (most recent call last):
  File "src/train_lomo.py", line 136, in <module>
    train()
  File "src/train_lomo.py", line 129, in train
    trainer.train()
  File "/workspace/LOMO/src/lomo_trainer.py", line 116, in train
    self.optimizer.grad_norm(loss)
  File "/workspace/LOMO/src/lomo.py", line 186, in grad_norm
    loss.backward(retain_graph=True)
  File "/opt/conda/lib/python3.7/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/__init__.py", line 156, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
  File "/workspace/LOMO/src/lomo.py", line 117, in func
    torch.distributed.all_reduce(p.grad, op=torch.distributed.ReduceOp.AVG, async_op=False)
AttributeError: type object 'torch._C._distributed_c10d.ReduceOp' has no attribute 'AVG'

https://github.com/OpenLMLab/LOMO/blob/ee7d431344569bc69ff7283b70141b5c6d66c901/src/lomo.py#L117C23-L117C23

请问是我的torch版本的问题吗,这个怎么处理呢, torch版本1.10.0 感谢您的回复


之前报

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

于是我将

tokenizer = AutoTokenizer.from_pretrained(
        model_args.model_name_or_path,
        use_fast=False,
        padding_side='left'
    )

改成了

tokenizer = LlamaTokenizer.from_pretrained(
        model_args.model_name_or_path,
        use_fast=False,
        padding_side='left'
    )

和这个有关系么

KaiLv69 commented 1 year ago

你好,是torch版本问题。FYI:https://pytorch.org/docs/1.10/distributed.html?highlight=torch%20distributed%20reduceop#torch.distributed.ReduceOp

season1blue commented 1 year ago

感谢回复 请问我应该升级到什么版本呢

season1blue commented 1 year ago

或者您使用的版本是什么版本的torch

KaiLv69 commented 1 year ago

或者您使用的版本是什么版本的torch

我使用的是torch2.0