Link-Li / Balanced-DataParallel

这里是改进了pytorch的DataParallel, 用来平衡第一个GPU的显存使用量
231 stars 51 forks source link

参数如何传递? #2

Closed xwjBupt closed 4 years ago

xwjBupt commented 4 years ago

感谢老哥的分享,但是初始化的时候报错,还请老哥解答一下,十分感谢!!以下是我的代码:

 net = buildmodel(args.netname)

 if len(args.num_gpus) > 1:
        net = BalancedDataParallel(gpu0_bsz=args.maingpu_bs, net, dim=0)
        # net = torch.nn.DataParallel(net)
net.cuda()
print('init model')

报错:

net = BalancedDataParallel(gpu0_bsz=args.maingpu_bs, net, dim=0)

SyntaxError: positional argument follows keyword argument(net)

请问如何解决?十分感谢

Link-Li commented 4 years ago

百度不就有了吗, Python语法问题, 改成net = BalancedDataParallel(args.maingpu_bs, net, dim=0)

xwjBupt commented 4 years ago

感谢老哥的回复,改了一下训练的时候可以跑通了,但是验证的时候报错了,现在是用3块GPU,设置的总bs是6,maingpu_bs = 2,验证的时候bs 是1,报了如下错误: Traceback (most recent call last): File "local_train.py", line 459, in net = buildmodel(args.netname, *args.netkwargs) File "local_train.py", line 208, in val result = net(scans) File "/home/xwj/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, **kwargs) File "/home/xwj/MyBrats/Lib/BalancedDataParallel.py", line 70, in forward outputs = self.parallel_apply(replicas, device_ids, inputs, kwargs) File "/home/xwj/MyBrats/Lib/BalancedDataParallel.py", line 74, in parallel_apply return parallel_apply(replicas, inputs, kwargs, device_ids) File "/home/xwj/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 37, in parallel_apply assert len(modules) == len(inputs) AssertionError

是否是因为验证的时候bs和设置的maingpu_bs不一致? 感谢老哥~~

Link-Li commented 4 years ago

一个数据三个gpu怎么分…………

xwjBupt commented 4 years ago

那像我这种验证的bs为1的时候,多GPU就不能用这个包吗?我用原生的dataparallel是没有问题的,再次感谢!

Link-Li commented 4 years ago

gpu0_bsz这个参数设置为1,在batchsize设置为1的时候