facebookresearch / votenet

Deep Hough Voting for 3D Object Detection in Point Clouds
MIT License
1.7k stars 376 forks source link

More metrics to understand what the model is learning #57

Open imntl opened 5 years ago

imntl commented 5 years ago

Hey there,

i would like to ask you, how you did evaluate the learning phase of the model. I have created my own dataset and extended the VoteNet to 3D rotation and don't get any good results yet. Plotting Histograms of the weights, biases and their gradients show from my understanding, that the model isn't learning anything and I wanted to know, if you had some further debugging mechanics to see better, what is happening inside.

alar0330 commented 5 years ago

What about the individual losses? Are all of them not converging?

In order to generalize VoteNet to 3D rotations you would have had to adapt the bbox parametrizations. Have you tried visualizing the point cloud and your parametrized bboxes? What about bbox centroids and gt for votes? I would do that first to make sure that the input to the model is correctly prepared. Besides that you would probably want to visualize the output of the model as well. See e.g. "dump_helper.py" for inspiration.

imntl commented 5 years ago

They are converging, but i think not far enough.

I did rewrite the SUNRGB-D dataloader to fit to my dataset and prepared everything to the same state as the SUNRGB-D dataset. All visualizations, which came with it, are working correctly (for all the objects). So nothing yet showed me something to believe, that I did not prepare the input to the model correctly.

So what makes me think, that the model doesn't learn anything is, that the mAP does not go over 0.1, and the visualized output of the model (out of the demo.py) shows mostly the right amount of bounding boxes (especially precise if only one thing was in the scene), but the boxes don't fit very well, or at all.

The two things I am not sure about yet are, that I implemented the 3D rotations right, I think I did, but I can't be sure, and that I have a valid dataset. At the moment I'm rendering depth images with blender of 10 different cad files in different scenes. Up till now nothing special, but I think, that the point clouds out of the depth images are still to perfect, even after the augmentation copied from the SUNRGB-D dataset.

I tried to adapted visualizations like vanilla backpropagation from utkuozbulak/pytorch-cnn-visualizations as colored PLYs, but did not yet get it to work.

kentaroy47 commented 4 years ago

@imntl Did boxnet work for you? Boxnet may be easier to converge.

imntl commented 4 years ago

@kentaroy47 I had not the time to try it with boxnet yet. Till now I don't need the bounding box, so I just used PointNet++ for classification and segmentation purposes, which worked good on my own synthesized dataset. At the moment I try to get it to work with depth images out of different cameras. Still it would be nice to have some more visualizations of the inner workings of the model, plotted on a pointcloud or something along these lines, just to also see, whats the important parts of the pointcloud for the model for getting a better understanding how to get a good abstraction to real world data.

ch-sa commented 4 years ago

Hey @imntl,

did you manage any progress on this problem? I am looking into the same issue, especially understanding the effect on performance for including 3D pose. Would really like to exchange about this topic!

Greetings from Dresden!

imntl commented 4 years ago

I'm sorry @ch-sa, I did not make any progress yet. I think that there should be a way to visualize the data the same way as projects like grad-cam for 3D data, but it is probably a lot of work.

IliasMAOUDJ commented 3 years ago

Hello, I know this post is quite old but I think the tips given are not complete. More than modifying the input to 3D rotation, I think we must modify the IOU and NMS functions since they are axis aligned. Maybe more to do, I'm investigating this...

vinayver198 commented 3 years ago

Hi @imntl , I also tried training 3d dataset having 3 rotations but using frustum-pointnet and faced similar kind of issues that you have stated in your previous comments. Have you made any progress after those experiments ?

Hi @IliasMAOUDJ , Have you made any progress in your investigations and have got any pointers ?

Thanks