charlesq34 / pointnet

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation
Other
4.66k stars 1.44k forks source link

Training for variable number of points. #213

Open Sharknado888 opened 4 years ago

Sharknado888 commented 4 years ago

In the paper, pointnet is trained on an equal number of points sampled from the point cloud model. Theoretically, the model should be able to train for objects with variable points as all the operations are point-based or symmetric. But the main problem in training with variable points for each object is the input data representation during training the model. I wanted to know if there is any way one can train the network considering a different number of points from each object. Please help :).

tricostume commented 4 years ago

I am also looking for an answer to this question. Did you happen to find anything on this?

btickell commented 4 years ago

I've found some success modifying the network to include a mask. Compute the maximum number of points across the samples in your data set, call this N. Then initialize point net with a modified input point cloud size of (B x N x 4). The 4th dimension is a mask value that indicates that the i-th point is a valid point. Slice the input into two tensors points := (BxNx3) and mask := (BxNx1).

Then before your pooling operation, replace all zero masked elements with -np.inf using tf.fill and tf.where. These features will never be selected by the max pooling operation so your output will be a max over valid points.

Then just add a 4th dimension to all your data and mask the points accordingly.

Sharknado888 commented 4 years ago

@tricostume Hey, I just ended up padding the data. Similar to NLP applications. It works but I feel this method is not ideal. @btickell 's method seems to be better than what I did.

btickell commented 4 years ago

Since I received a request asking for code, I removed my domain specific modifications and will describe roughly how to do this.

To the PointNet constructor function, pass an [BxNx4] placeholder instead of [BxNx3] where B is the batch size, N is the maximum number of points and the added 4th dimension is a 0/1 mask that indicates whether a point is valid or not.

Then split the input PH into the point cloud values and the mask vector:

point_cloud, mask = tf.split(point_cloud, [3, 1], 2)

Then after the last conv2d layer but before the pooling layer, apply the mask and do some reshaping:

net = tf.squeeze(net, axis=2)
mask = tf.tile(mask, multiples=[1, 1, 1024]) # Extend mask to match net output dimension
net = tf.where(tf.equal(mask, 1.0), net, tf.fill(tf.shape(net), -np.inf))
net = tf.expand_dims(net, axis=2)
CharlesGaydon commented 2 years ago

Additionally, PointNet is not supposed to be affected by redundant points due to its max-pooling operation. It is therefore possible to just sample point clouds with replacement, to a unique point cloud size (e.g. max number of points in all clouds or in clouds from current batch).

In terms of compute cost this is equivalent to padding as placeholders points will be processed anyway. But it is more sound from a theoretical point of view .

ListIndexOutOfRange commented 8 months ago

Just to add a little clarification to the answer just above: PointNet is exactly invariant to point duplication in the absence of Batch Normalization layer. With BatchNorm, this isn't true anymore. However, it doesn't change on average. If you plan on using PointNet in a framework where point density matters, you may want to be careful about that.

adosar commented 5 months ago

Additionally, PointNet is not supposed to be affected by redundant points due to its max-pooling operation. It is therefore possible to just sample point clouds with replacement, to a unique point cloud size (e.g. max number of points in all clouds or in clouds from current batch).

In terms of compute cost this is equivalent to padding as placeholders points will be processed anyway. But it is more sound from a theoretical point of view .

@CharlesGaydon So it can be trained with a different number of points (e.g. one batch might have size 13, another batch 24 and so on) thanks to the convolution operations which can handle arbitrary input size?