Open XxSuper opened 7 months ago
IMPORTANT: Use deepspeed==0.7.0 pytorch-lightning==1.9.2 torch 1.13.1+cu117
IMPORTANT: Use deepspeed==0.7.0 pytorch-lightning==1.9.2 torch 1.13.1+cu117
感谢指导,上述问题已解决,但是出现双卡训练崩溃问题,请教一下是什么原因导致? Loading extension module fused_adam... Time to load fused_adam op: 2.221088409423828 seconds Loading extension module fused_adam... Time to load fused_adam op: 2.2072091102600098 seconds /opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead warnings.warn( /opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py:429: UserWarning: torch.distributed.distributed_c10d._get_global_rank is deprecated please use torch.distributed.distributed_c10d.get_global_rank instead warnings.warn( Bus error (core dumped)
要看具体错误,请截全
torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1+cu118 deepspeed 0.12.4 pytorch-lightning 2.1.2 提示报错: AttributeError: "MyDataset' object has no attribute 'global rank'