NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.37k stars 1.39k forks source link

deepcopy DistributedDataParallel loss actual model #451

Open meijieru opened 5 years ago

meijieru commented 5 years ago
model = nn.Linear(10, 2).cuda()
torch_wrapper = torch.data.parallel.DistributedDataParallel(model)
apex_wrapper = apex.parallel.DistributedDataParallel(model)

lhs = copy.deepcopy(torch_wrapper).module  # ok
rhs = copy.deepcopy(apex_wrapper).module  # failed, AttributeError: 'DistributedDataParallel' object has no attribute 'module'
ASDen commented 4 years ago

@mcarilli any ideas on this ? I also face the same problem, I know torch DDP is recommended now, but any workarounds for this ?

Peidong-Wang commented 2 years ago

Check out https://discuss.pytorch.org/t/torch-cuda-amp-vs-nvidia-apex/74994/2