Closed lukasbrchl closed 6 years ago
I'm yet to test the code on PyTorch 0.4 now. The code works in PyTorch 0.3. So, perhaps you can make a python environment with 0.3. I won't have access to a GPU till monday, so you'll have to wait before I can test it myself. I suppose this has to do with the introduction of scalars in PyTorch 0.4. However, here's how you can try to debug the code.
Look, the line objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
fetches the class of the detection. Here are a couple of things you can do.
First, in detect.py
, go to this line, objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
, and put it in a try-catch block so you can see why the exception is happening. Precisely, type.
try:
objs = [classes[int(x[-1])] for x in output if int(x[0]) == im_id]
except IndexError:
print ("Troublesome index {}".format(int(x[-1])))
assert False
This will stop the program with the problematic index, which is causing trouble. Ideally, it should be between 0 and 79, since this is the index the COCO class which has been detected. What you can also do is print the variable output
after line 210. output
is a tensor
that holds information about detections. Here's a sample. Inspect what is the the last column of the output, which is what int(x[-1])
is retrieving. For example,
The last column is basically the index of your COCO class. We cast it to int so we can index it.
Do let me know what you get, and maybe we can sort it out. Or just create an environment with PyTorch 0.3 (Last time I checked, the conda and pip channels offered that) and run the code in that.
Perfect, thank your for your quick response. I think this is caused by PyTorch 0.4 version. With your explanation, it is more clear to me now and I will try to fix it in free time. Will make pull request if I get successful.
On a side note, I'd be interested in knowing how do you cast the (7,) tensor into (7,1) tensor. Maybe something is off there. Normally in PyTorch, when you have to do such a thing, you'd type.
image_pred_class = image_pred_class.unsqueeze(1)
This will insert a dimension of size one at index number 1, (or whatever the arg to unsqueeze
is).
Well, I did something similar I saw in your code
image_pred_class = image_pred_class.new(image_pred_class.size(0), 1).fill_(image_pred_class[0])
.
Yea, I know this is not the best way, but it worked :D
Finally, I found the issue when I checked your structure you wrote and my structure in "prediction" variable in detect.py. The problem is that PyTorch now handles the tensors little differently in my opinion. I had the values of box coordinates and class ids written in single column likes this:
0.0000 59.8087
0.0000 95.7915
0.0000 313.1047
0.0000 308.8701
0.0000 0.9959
0.0000 0.9978
0.0000 1.0000
0.0000 256.9860
0.0000 63.2367
0.0000 374.7674
0.0000 120.6163
0.0000 0.9986
0.0000 0.8848
0.0000 7.0000
0.0000 67.9070
0.0000 164.1937
0.0000 174.7524
0.0000 386.2670
0.0000 0.9999
0.0000 0.9997
0.0000 16.0000
[torch.cuda.FloatTensor of size (21,2) (GPU 0)]
When I add a little dirty hack on line 200 in util.py, it now works on 0.4 version perfectly.
image_pred_class = image_pred_class.unsqueeze(1)
image_pred_class = torch.transpose(image_pred_class, 0, 1)
I am not making pull request because this would break your current solution and it isn't elegant either. I think someone with more experience could fix it better. Thank you for your help!
Okay. What exactly have you printed above? the (21,2) tensor? Downloading 0.4 now.
Sorry for not being clear. The above (21,2) tensor is is output from method "write_results", which is called at line 183 in detect.py. It is also content of the variable named "prediction". I run the detection only with single image (dog-cycle-car.png) which has 3 detectable objects in it. This explains the (21,2) structure and the error "list index out of range" I mentioned before.
All this can be fixed just by transposing the tensor in util.py. I only added
image_pred_class = image_pred_class.unsqueeze(1)
image_pred_class = torch.transpose(image_pred_class, 0, 1)
on line 200 in util.py and it started working.
I tried compiling 0.4, but compilation fails on my system (OSX 10.13.3, hate compiling stuff on mac, use Ubuntu at work). I chose PyTorch 0.3 because most of the people are on that, and compilation can give you errors.
Can you try whether your solution works on an entire folder of images, as well as on video? (video_demo.py
)? If yes can you make a PR with a hotfix along lines of
if torch.__version__ [2] == "4":
image_pred_class = image_pred_class.unsqueeze(1)
image_pred_class = torch.transpose(image_pred_class, 0, 1)
P.S. torch.__version__
returns a string like "0.4.0post..."
If you can check the above hotfix works for 0.4 with the 3 cases we have (1 image, folder of images and video), then I'd really appreciate if you could make a pull request.
I'd try to fix the issue once I have my hands on an Ubuntu system.
Thanks.
Ok, so I tried on the entire folder and found out that this hotfix doesn't work. The reason is that if there are multiple objects of same class in one image the variable "image_pred_class" is not just (7,) but it turns into (X,7) where X is the number of the same objects in image. So the unsqueeze would give it another dimension and turn it into (1,X,7) which is obviously wrong.
Solution to this would be to make image_pred_class two dimensional independent to number of same classes in image. This is now broken due to PyTorch 0.4 version incompatibility.
To sum it up. The image_pred_class is one dimensonal column vector (should be row vector), when there is only one object of a kind in image and is 2d matrix (which is correct) when there are more objects of same kind. So this has to be fixed, to run it on 0.4 version, but I don't know how. Maybe someone more experienced with the same problem and 0.4 version will look into it.
Okay. Thanks for your effort. I'm closing this issue as of now. I'll try to solve it when I get my hands on 0.4 this week. However, I guess it's okay to defer this issue at least until 0.4 becomes the channel that is available in conda/pip channels since a lot of people get their PyTorch from there. Meanwhile, I'll update the readme. I think I'll focus on ability to train the model on COCO before I'll get back to this issue.
Can you, btw, check whether the folder version works normally on PyTorch 0.4 (without hotfix). Stayed up all night trying to compile PyTorch 0.4 on macOS, and it wouldn't go through :(
No, it won't run on the folder, because there are images containing only one same class but also there is image when there are multiple horses. So it will crash with or without hotfix.
Hey, seems like I've solved what's causing the screw-up. It has to do with how PyTorch slices a Tensor. You see, the piece of code that generates image_pred_class
is
image_pred_class = image_pred_class[non_zero_ind]
What we're basically doing here is slicing a tensor with the indices of values we need. Now, this is supposed to return a tensor with shape (n,7) where n is the number of detections belonging to a particular class.
This works identically both in 0.3 and 0.4 when n is not equal to 1, but when n = 1, 0.3 returns a (1,7) tensor, and 0.4 returns a (7,) tensor. The (7,) tensor returned by 0.4 leads to big screw-ups, as we subsequently initialize batch_ind
, a tensor of size (n,1) to hold the index of the batch the image belongs to. The way we do that is to make sure is like this.
batch_ind = image_pred_class.new(image_pred_class.size(0), 1).fill_(ind)
So, batch_ind
is a tensor with the same number of rows as image_pred_class
and one column. This works fine when image_pred_class
is of the form (n,7). However, in our case, image_pred_class
is (7,) and hence batch_ind
becomes a (7,1) tensor, and that was the reason why you had to transpose image_pred_class
to make it work.
I've used such indexing at three places, to get various slices (based on confidence threshold, then to choose one class at a time, and then according to NMS threshold), and the problem occurs at every place if slice returns one row which will then be returned as a 1-d tensor.
As of now, It's pretty easy. At the time of creation of these slices, force these slices to be two-dimensional tensors with the number of columns fixed to 7, so rows will be decided accordingly.
image_pred_class = image_pred_class[non_zero_ind].view(-1,7)
I have created a branch called pyt4
which contains the fixes. I'd appreciate if you could pull it and test it. I'll merge it into master after testing it a bit more.
Now, I realize I've been a bit lazy while designing this piece of code. A better way to index might have been using the function torch.index_select
(which I wasn't aware of at the time of writing, and then being a lazy bum to fix it).
PyTorch 0.4 also comes with a function torch.where
which allows you to slice on basis of a boolean expression, something that wasn't available in 0.3, and forced me to use this way to generate slices. Gotta give it a try. I'm closing this issue, but if torch.cat works
better, I'll make a comment here.
Hi, firstly, thank you for your work on this repo. I tried to run your code, but I get this exception:
I tried to debug what is happening in the code, but it is not very clear to me. I only noticed that in my case, one tensor's size is 7 and the other is (7,1). I am also adding a screen.
So I have modified the 7 sized tensor into (7,1) dimension but then I get another exception from further code.
Don't you please know what I am doing wrong? It is my first time working with PyTorch, so I am not very experienced and I don't know how to fix it myself.
Using Ubuntu 16.4, Python 3.6, CUDA 9.0, PyTorch 0.4. Thanks