ayooshkathuria / pytorch-yolo-v3

A PyTorch implementation of the YOLO v3 object detection algorithm
3.3k stars 1.06k forks source link

Tensors have different dimensions #1

Closed lukasbrchl closed 6 years ago

lukasbrchl commented 6 years ago

Hi, firstly, thank you for your work on this repo. I tried to run your code, but I get this exception:

Traceback (most recent call last):
  File "/home/lukas/dev/pytorch-yolo-v3/detect.py", line 183, in <module>
    prediction = write_results(prediction, confidence, num_classes, nms = True, nms_conf = nms_thesh)
  File "/home/lukas/dev/pytorch-yolo-v3/util.py", line 189, in write_results
    output = torch.cat(seq, 0)
RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 1 and 2 at /tmp/pip-w41ywlv_-build/aten/src/THC/generic/THCTensorMath.cu:102

I tried to debug what is happening in the code, but it is not very clear to me. I only noticed that in my case, one tensor's size is 7 and the other is (7,1). I am also adding a screen. image

So I have modified the 7 sized tensor into (7,1) dimension but then I get another exception from further code.

Traceback (most recent call last):
  File "/opt/pycharm-2018.1/helpers/pydev/pydevd.py", line 1664, in <module>
    main()
  File "/opt/pycharm-2018.1/helpers/pydev/pydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/opt/pycharm-2018.1/helpers/pydev/pydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/opt/pycharm-2018.1/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/lukas/dev/pytorch-yolo-v3/detect.py", line 213, in <module>
    objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
  File "/home/lukas/dev/pytorch-yolo-v3/detect.py", line 213, in <listcomp>
    objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
IndexError: list index out of range

Don't you please know what I am doing wrong? It is my first time working with PyTorch, so I am not very experienced and I don't know how to fix it myself.

Using Ubuntu 16.4, Python 3.6, CUDA 9.0, PyTorch 0.4. Thanks

ayooshkathuria commented 6 years ago

I'm yet to test the code on PyTorch 0.4 now. The code works in PyTorch 0.3. So, perhaps you can make a python environment with 0.3. I won't have access to a GPU till monday, so you'll have to wait before I can test it myself. I suppose this has to do with the introduction of scalars in PyTorch 0.4. However, here's how you can try to debug the code.

Look, the line objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id] fetches the class of the detection. Here are a couple of things you can do.

First, in detect.py, go to this line, objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id], and put it in a try-catch block so you can see why the exception is happening. Precisely, type.

try:
    objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
except IndexError:
    print ("Troublesome index {}".format(int(x[-1])))
    assert False

This will stop the program with the problematic index, which is causing trouble. Ideally, it should be between 0 and 79, since this is the index the COCO class which has been detected. What you can also do is print the variable output after line 210. output is a tensor that holds information about detections. Here's a sample. Inspect what is the the last column of the output, which is what int(x[-1]) is retrieving. For example,

screen shot 2018-04-14 at 6 45 22 pm

The last column is basically the index of your COCO class. We cast it to int so we can index it.

Do let me know what you get, and maybe we can sort it out. Or just create an environment with PyTorch 0.3 (Last time I checked, the conda and pip channels offered that) and run the code in that.

lukasbrchl commented 6 years ago

Perfect, thank your for your quick response. I think this is caused by PyTorch 0.4 version. With your explanation, it is more clear to me now and I will try to fix it in free time. Will make pull request if I get successful.

ayooshkathuria commented 6 years ago

On a side note, I'd be interested in knowing how do you cast the (7,) tensor into (7,1) tensor. Maybe something is off there. Normally in PyTorch, when you have to do such a thing, you'd type.

image_pred_class = image_pred_class.unsqueeze(1)

This will insert a dimension of size one at index number 1, (or whatever the arg to unsqueeze is).

lukasbrchl commented 6 years ago

Well, I did something similar I saw in your code image_pred_class = image_pred_class.new(image_pred_class.size(0), 1).fill_(image_pred_class[0]). Yea, I know this is not the best way, but it worked :D

Finally, I found the issue when I checked your structure you wrote and my structure in "prediction" variable in detect.py. The problem is that PyTorch now handles the tensors little differently in my opinion. I had the values of box coordinates and class ids written in single column likes this:

   0.0000   59.8087
   0.0000   95.7915
   0.0000  313.1047
   0.0000  308.8701
   0.0000    0.9959
   0.0000    0.9978
   0.0000    1.0000
   0.0000  256.9860
   0.0000   63.2367
   0.0000  374.7674
   0.0000  120.6163
   0.0000    0.9986
   0.0000    0.8848
   0.0000    7.0000
   0.0000   67.9070
   0.0000  164.1937
   0.0000  174.7524
   0.0000  386.2670
   0.0000    0.9999
   0.0000    0.9997
   0.0000   16.0000
[torch.cuda.FloatTensor of size (21,2) (GPU 0)]

When I add a little dirty hack on line 200 in util.py, it now works on 0.4 version perfectly.

image_pred_class = image_pred_class.unsqueeze(1)
image_pred_class = torch.transpose(image_pred_class, 0, 1)

I am not making pull request because this would break your current solution and it isn't elegant either. I think someone with more experience could fix it better. Thank you for your help!

ayooshkathuria commented 6 years ago

Okay. What exactly have you printed above? the (21,2) tensor? Downloading 0.4 now.

lukasbrchl commented 6 years ago

Sorry for not being clear. The above (21,2) tensor is is output from method "write_results", which is called at line 183 in detect.py. It is also content of the variable named "prediction". I run the detection only with single image (dog-cycle-car.png) which has 3 detectable objects in it. This explains the (21,2) structure and the error "list index out of range" I mentioned before.

All this can be fixed just by transposing the tensor in util.py. I only added

image_pred_class = image_pred_class.unsqueeze(1)
image_pred_class = torch.transpose(image_pred_class, 0, 1)

on line 200 in util.py and it started working.

ayooshkathuria commented 6 years ago

I tried compiling 0.4, but compilation fails on my system (OSX 10.13.3, hate compiling stuff on mac, use Ubuntu at work). I chose PyTorch 0.3 because most of the people are on that, and compilation can give you errors.

Can you try whether your solution works on an entire folder of images, as well as on video? (video_demo.py)? If yes can you make a PR with a hotfix along lines of

if torch.__version__ [2]  == "4":
        image_pred_class = image_pred_class.unsqueeze(1)
        image_pred_class = torch.transpose(image_pred_class, 0, 1)

P.S. torch.__version__ returns a string like "0.4.0post..." If you can check the above hotfix works for 0.4 with the 3 cases we have (1 image, folder of images and video), then I'd really appreciate if you could make a pull request.

I'd try to fix the issue once I have my hands on an Ubuntu system.

Thanks.

lukasbrchl commented 6 years ago

Ok, so I tried on the entire folder and found out that this hotfix doesn't work. The reason is that if there are multiple objects of same class in one image the variable "image_pred_class" is not just (7,) but it turns into (X,7) where X is the number of the same objects in image. So the unsqueeze would give it another dimension and turn it into (1,X,7) which is obviously wrong.

Solution to this would be to make image_pred_class two dimensional independent to number of same classes in image. This is now broken due to PyTorch 0.4 version incompatibility.

To sum it up. The image_pred_class is one dimensonal column vector (should be row vector), when there is only one object of a kind in image and is 2d matrix (which is correct) when there are more objects of same kind. So this has to be fixed, to run it on 0.4 version, but I don't know how. Maybe someone more experienced with the same problem and 0.4 version will look into it.

ayooshkathuria commented 6 years ago

Okay. Thanks for your effort. I'm closing this issue as of now. I'll try to solve it when I get my hands on 0.4 this week. However, I guess it's okay to defer this issue at least until 0.4 becomes the channel that is available in conda/pip channels since a lot of people get their PyTorch from there. Meanwhile, I'll update the readme. I think I'll focus on ability to train the model on COCO before I'll get back to this issue.

ayooshkathuria commented 6 years ago

Can you, btw, check whether the folder version works normally on PyTorch 0.4 (without hotfix). Stayed up all night trying to compile PyTorch 0.4 on macOS, and it wouldn't go through :(

lukasbrchl commented 6 years ago

No, it won't run on the folder, because there are images containing only one same class but also there is image when there are multiple horses. So it will crash with or without hotfix.

ayooshkathuria commented 6 years ago

Hey, seems like I've solved what's causing the screw-up. It has to do with how PyTorch slices a Tensor. You see, the piece of code that generates image_pred_class is

image_pred_class = image_pred_class[non_zero_ind]

What we're basically doing here is slicing a tensor with the indices of values we need. Now, this is supposed to return a tensor with shape (n,7) where n is the number of detections belonging to a particular class.

This works identically both in 0.3 and 0.4 when n is not equal to 1, but when n = 1, 0.3 returns a (1,7) tensor, and 0.4 returns a (7,) tensor. The (7,) tensor returned by 0.4 leads to big screw-ups, as we subsequently initialize batch_ind, a tensor of size (n,1) to hold the index of the batch the image belongs to. The way we do that is to make sure is like this.

batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)

So, batch_ind is a tensor with the same number of rows as image_pred_class and one column. This works fine when image_pred_class is of the form (n,7). However, in our case, image_pred_class is (7,) and hence batch_ind becomes a (7,1) tensor, and that was the reason why you had to transpose image_pred_class to make it work.

I've used such indexing at three places, to get various slices (based on confidence threshold, then to choose one class at a time, and then according to NMS threshold), and the problem occurs at every place if slice returns one row which will then be returned as a 1-d tensor.

So, what's the solution?

As of now, It's pretty easy. At the time of creation of these slices, force these slices to be two-dimensional tensors with the number of columns fixed to 7, so rows will be decided accordingly.

image_pred_class = image_pred_class[non_zero_ind].view(-1,7)

I have created a branch called pyt4 which contains the fixes. I'd appreciate if you could pull it and test it. I'll merge it into master after testing it a bit more.

Other solutions

Now, I realize I've been a bit lazy while designing this piece of code. A better way to index might have been using the function torch.index_select (which I wasn't aware of at the time of writing, and then being a lazy bum to fix it).

PyTorch 0.4 also comes with a function torch.where which allows you to slice on basis of a boolean expression, something that wasn't available in 0.3, and forced me to use this way to generate slices. Gotta give it a try. I'm closing this issue, but if torch.cat works better, I'll make a comment here.