Cysu / open-reid

Open source person re-identification library in python
https://cysu.github.io/open-reid/
MIT License
1.33k stars 349 forks source link

Zero-dimensional tensor concatenation problem #69

Open leobxpan opened 6 years ago

leobxpan commented 6 years ago

Hi there,

Thank you for the code!

While training the ResNet50 model using the market1501 dataset, I got the following Runtime error:

Traceback (most recent call last):
  File "examples/triplet_loss.py", line 232, in <module>
    main(parser.parse_args())
  File "examples/triplet_loss.py", line 151, in main
    trainer.train(epoch, train_loader, optimizer)
  File "/home/bxpan/.local/lib/python3.5/site-packages/open_reid-0.2.0-py3.5.egg/reid/trainers.py", line 33, in train
  File "/home/bxpan/.local/lib/python3.5/site-packages/open_reid-0.2.0-py3.5.egg/reid/trainers.py", line 83, in _forward
  File "/home/bxpan/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bxpan/.local/lib/python3.5/site-packages/open_reid-0.2.0-py3.5.egg/reid/loss/triplet.py", line 26, in forward
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated

The problem turned out to happen at this specific line of code in triplet.py: dist_ap = torch.cat(dist_ap) I have printed out dist_ap, which is a python list full of zero-dimensional (its printed-out size is: torch.Size([])) tensors (I used a batch-size of 64 so the list has a length of 64):

[tensor(0.2895, device='cuda:0'), tensor(0.3334, device='cuda:0'), tensor(0.3334, device='cuda:0'), tensor(0.3175, device='cuda:0'), tensor(0.3078, device='cuda:0'), tensor(0.3078, device='cuda:0'), tensor(0.3045, device='cuda:0'), tensor(0.3045, device='cuda:0'), tensor(0.2636, device='cuda:0'), tensor(0.2630, device='cuda:0'), tensor(0.2497, device='cuda:0'), tensor(0.2636, device='cuda:0'), tensor(0.2967, device='cuda:0'), tensor(0.2657, device='cuda:0'), tensor(0.2967, device='cuda:0'), tensor(0.2936, device='cuda:0'), tensor(0.3517, device='cuda:0'), tensor(0.2939, device='cuda:0'), tensor(0.3517, device='cuda:0'), tensor(0.3185, device='cuda:0'), tensor(0.3318, device='cuda:0'), tensor(0.3357, device='cuda:0'), tensor(0.3260, device='cuda:0'), tensor(0.3357, device='cuda:0'), tensor(0.2928, device='cuda:0'), tensor(0.2906, device='cuda:0'), tensor(0.2928, device='cuda:0'), tensor(0.2906, device='cuda:0'), tensor(0.1992, device='cuda:0'), tensor(0.2086, device='cuda:0'), tensor(0.2086, device='cuda:0'), tensor(0.2040, device='cuda:0'), tensor(0.2742, device='cuda:0'), tensor(0.2836, device='cuda:0'), tensor(0.3117, device='cuda:0'), tensor(0.3117, device='cuda:0'), tensor(0.2838, device='cuda:0'), tensor(0.2686, device='cuda:0'), tensor(0.2435, device='cuda:0'), tensor(0.2838, device='cuda:0'), tensor(0.3124, device='cuda:0'), tensor(0.3268, device='cuda:0'), tensor(0.3304, device='cuda:0'), tensor(0.3304, device='cuda:0'), tensor(0.2591, device='cuda:0'), tensor(0.2671, device='cuda:0'), tensor(0.2825, device='cuda:0'), tensor(0.2825, device='cuda:0'), tensor(0.3309, device='cuda:0'), tensor(0.2836, device='cuda:0'), tensor(0.3126, device='cuda:0'), tensor(0.3309, device='cuda:0'), tensor(0.3232, device='cuda:0'), tensor(0.3493, device='cuda:0'), tensor(0.3493, device='cuda:0'), tensor(0.3379, device='cuda:0'), tensor(0.3044, device='cuda:0'), tensor(0.3173, device='cuda:0'), tensor(0.3173, device='cuda:0'), tensor(0.3009, device='cuda:0'), tensor(0.2941, device='cuda:0'), tensor(0.3048, device='cuda:0'), tensor(0.3048, device='cuda:0'), tensor(0.2704, device='cuda:0')]

The values of the tensors seem to be of no problem, but the concatenation fails. Any idea about what the problem is?

Thank you very much.

Boxiao

frhf commented 6 years ago

Hi,

I got the same problem recently. I think it is connected to a newer version of pytorch.

What worked for me is replacing torch.cat with torch.stack, but I am not entirely sure if this solution is unproblematic.

Regards Frank

leobxpan commented 6 years ago

Hi Frank,

Sorry for the late response. I've tried your solution and it works. I'm not sure if torch.cat() can concatenate 1-dimensional tensors in previous versions of PyTorch. If it can, then this might be related to a version change. Better invite the author to check this @Cysu

Boxiao

lijianhackthon commented 6 years ago

Yeah, I come across the same issue and my Pytorch version is 0.4. Hope the author @Cysu can look into this.

diaoenmao commented 5 years ago

Actually, this issue is fixed for me at 0.4.1

insikk commented 5 years ago

@dem123456789 It works fine with pytorch 0.3.0. I saw this error on pytorch 0.4.0, so I upgraded it to 0.4.1. Problem still exists.

mhyousefi commented 5 years ago

I'm having a similar problem, where I cant concatenate the elements in a list of zero-dimensional tensors:

def basic_fun(x_cloned):
    res = []
    for i in range(len(x)):
        res.append(x_cloned[i] * x_cloned[i])
    print(res)
    return torch.cat(res)

def get_grad(inp, grad_var):
    A = basic_fun(inp)
    A.backward()
    return grad_var.grad

x = Variable(torch.FloatTensor([1, 2]), requires_grad=True)
x_cloned = x.clone()
print(get_grad(x_cloned, x))

Here are my terminal logs:

[tensor(1., grad_fn=<ThMulBackward>), tensor(4., grad_fn=<ThMulBackward>)]
Traceback (most recent call last):
  File "<path>/playground.py", line 23, in <module>
    print(get_grad(x_cloned, x))
  File "<path>/playground.py", line 16, in get_grad
    A = basic_fun(inp)
  File "<path>/playground.py", line 12, in basic_fun
    return torch.cat(res)
RuntimeError: zero-dimensional tensor (at position 0) cannot be concatenated
mhyousefi commented 5 years ago

Alright so apparently, I need to do torch.stack(res, dim=0) It produces: tensor([1., 4.], grad_fn=<StackBackward>)

Florian1990 commented 5 years ago

I'm having a similar problem, where I cant concatenate the elements in a list of zero-dimensional tensors:


def basic_fun(x_cloned):
    res = []
    for i in range(len(x)):
        res.append(x_cloned[i] * x_cloned[i])
    print(res)
    return torch.cat(res)

By slicing items of one-dimensional tensors you get zero-dimensional tensors that cannot be concatenated. To force getting one-dimensional tensors you can slice x_cloned[i, None].

Side note: I am not sure what you are doing in production, but element-wise multiplication in pytorch is easily done using the * operator:

def basic_fun(x_cloned):
    return x_cloned * x_cloned
GR4HAM commented 5 years ago

Another option is to use unsqueeze to turn a 0-dim tensor into a 1-dim tensor: res.append((x_cloned[i] * x_cloned[i]).unsqueeze(0))

themis0888 commented 5 years ago

Well, it seems that last versions supported that operation, but from 0.4, you should unsqueeze the tensors which are the elements of the list. You can do that with

for i in range(len(lst)): list[i] = torch.unsqueeze(list[i], dim = -1)

So the list should look like this. [tensor([0.2895], device='cuda:0'), tensor([0.3895], device='cuda:0')... ]

wujunyi627 commented 5 years ago

I can run the code in pytorch in 0.4.1 version. you need to change the triplet.py(reid/loss triplet.py) image

xinyi97 commented 2 years ago

Hi,

I got the same problem recently. I think it is connected to a newer version of pytorch.

What worked for me is replacing torch.cat with torch.stack, but I am not entirely sure if this solution is unproblematic.

Regards Frank

It's useful