Closed HanGuangXin closed 1 year ago
When set args.distributed to Ture:
model = torch.nn.parallel.DistributedDataParallel(model) teacher = torch.nn.DataParallel(teacher, device_ids=[0, 1, 2, 3, 4, 5, 6, 7]) teacher.cuda()
Why the student model uses torch.nn.parallel.DistributedDataParallel() but teacher model uses torch.nn.DataParallel.
just need to set teacher.cuda(args.gpu), no need to use DP or DDP.
teacher.cuda(args.gpu)
When set args.distributed to Ture:
Why the student model uses torch.nn.parallel.DistributedDataParallel() but teacher model uses torch.nn.DataParallel.