bubbliiiing / Siamese-pytorch

这是一个孪生神经网络(Siamese network)的库,可进行图片的相似性比较。
MIT License
605 stars 129 forks source link

不能多卡训练,没有找到解决办法 ValueError: Caught ValueError in replica 0 on device 0. #2

Open AvlTreeQL opened 3 years ago

AvlTreeQL commented 3 years ago

Traceback (most recent call last): File "/home/avltree/Projects/Siamese-pytorch/train.py", line 202, in fit_one_epoch(net,loss,epoch,epoch_size,epoch_size_val,gen,gen_val,Freeze_Epoch,Cuda) File "/home/avltree/Projects/Siamese-pytorch/train.py", line 62, in fit_one_epoch outputs = net(images) File "/home/avltree/miniconda3/envs/siamese/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, kwargs) File "/home/avltree/miniconda3/envs/siamese/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/avltree/miniconda3/envs/siamese/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/avltree/miniconda3/envs/siamese/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply output.reraise() File "/home/avltree/miniconda3/envs/siamese/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise raise self.exc_type(msg) ValueError: Caught ValueError in replica 0 on device 0. Original Traceback (most recent call last): File "/home/avltree/miniconda3/envs/siamese/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker output = module(*input, *kwargs) File "/home/avltree/miniconda3/envs/siamese/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, kwargs) File "/home/avltree/Projects/Siamese-pytorch/nets/siamese.py", line 29, in forward x1, x2 = x ValueError: not enough values to unpack (expected 2, got 1)

bubbliiiing commented 3 years ago

我没有多卡……暂时也不了解

LittleInorganic commented 3 years ago

我也有这个问题,楼主解决了吗

bubbliiiing commented 3 years ago

我没有多卡所以没法测试,但是这个问题看起来不像多卡的诶

LittleInorganic commented 3 years ago

我没有多卡所以没法测试,但是这个问题看起来不像多卡的诶

哎,谢谢大佬了,我再去研究研究

tkamkb commented 3 years ago

我也是这个问题,请问楼主现在解决了吗?我有两张卡。感觉是net = torch.nn.DataParallel(model)这里的问题,我换成net=model就能运行了,应该还是只能跑单卡的?

bubbliiiing commented 3 years ago

emmm我没有双卡现在,还测不了