charlesq34 / pointnet

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
Other
4.71k stars 1.44k forks source link

What does the visualization look like after TNet matmul input? #137

Closed kxhit closed 5 years ago

kxhit commented 5 years ago

Thanks for the excellent work! The authors say, "We predict an affine transformation matrix by a mini-network and directly apply this transformation to the coordinates of input points." I'm curious what the point cloud looks like after the transformation. Is it a three-dimensional representation that a person can still understand?

kuynzereb commented 5 years ago

I am not the author, but I experimented with a PointNet-like architecture and I visualized transformed point clouds. I was surprised to see that transformed point clouds are almost the same as original ones. And when I printed transformation matrices they indeed were very close to identity. So now I think that TNet on input have not much sense. In my experiments results were slightly better without TNet on input.

romanskie commented 5 years ago

Hey @kuynzereb,

can you maybe explain me how the testing of the trained model works?

I mean during training an affine transformation is learned via the spatial transformer network for the points and features of every input point cloud. But how does this help and acctually work during testing the trained model? Does the model then try to transform the individual point clouds from the test set with one of the learned affine transformations in order to predict it?

Maybe you have an idea here.

Thanks in advance!

BG Roman

kxhit commented 5 years ago

@kuynzereb Hi! I run the evaluate.py with the trained model (set parameter as default). Then I get the input_points(4x1024x3), transformed_points(4x1024x3) and TNet(4x3x3). I convert the input and the output to pcd file and visualize them with pcl_viewer.

  1. The points are rotated and scaled to a cube, which is around 1m³.
  2. The author wanted the transformation matrix to be close to orthogonal matrix in the pointnet paper. But the TNet(3x3) I get is not close to orthogonal matrix and still get a large loss in this iterm: L = ||1-AA'||.
  3. TNet is invertible. Does TNet have other special properties in linear algebra?
  4. I train the model without the TNet1 and get a little worse result than the original. original: eval mean loss: 0.498235 eval accuracy: 0.891234 eval avg class acc: 0.861447

without TNet1 eval mean loss: 0.777766 eval accuracy: 0.868912 eval avg class acc: 0.843538

Data and logs can be found here.

@romanskie During testing, the parameters are restored(the parameters are saved when model is trained) firstly. Then, the input_cloud_points(BatchxNx3) will be input to TNet and TNet will output a Batchx3x3 matrix. After that, the output of TNet will multiply with the input_cloud_points and get the transformed cloud points. Hope it helps you.

romanskie commented 5 years ago

Hey @kxhit,

thanks for your response.

This is my understanding now :) The network learns two transformations (T1 and T2) for each point cloud of the traning set during training. Means like 5000 x 2 transformations if your train set is about 5000 point clouds? By learning these transformations, the trained model then knows how to transform the point clouds from the test set in order to predict them?

BG Roman

kxhit commented 5 years ago

@romanskie Yeah, maybe the TNet learns to get information from the input points and then predicts a transformation matrix(3x3). Then, the matrix makes the input points rotate and scaled, which may be useful for the later feature extractions.

kuynzereb commented 5 years ago

@kxhit

  1. Yeah, I also visualized points transformed by default model and got the same results: the points are rotated and upscaled.
  2. Not exactly. The authors add a regularization term only for features TNet because it has much higher dimension than inputs TNet. And if I understand right, they do it just to simplify the optimization problem. However, it seems like inputs TNet learns something like rotation + scale, so it can be considered orthogonal up to scale.
  3. I don't know, I think we should not expect some special properties unless we explicitly make some constraints, like regularization term in features TNet.
  4. I got the following results with default settings: Original: eval mean loss: 0.551 eval accuracy: 0.881 eval avg class acc: 0.856 Without both TNets: eval mean loss: 0.564 eval accuracy: 0.870 eval avg class acc: 0.843

@romanskie I think @kxhit have answered your question :)