Closed jackd closed 5 years ago
BATCH_SIZE = 16
SUB_BATCH_SIZE = 4
NUM_POINTS = 1024
K = 8
FEATURE_LEN = 128
TREE_DEPTH = 11
class Model(ModelDesc):
def _get_inputs(self):
return [InputDesc(tf.float32, (BATCH_SIZE, NUM_POINTS, 3), 'point'),
InputDesc(tf.float32, (BATCH_SIZE, 2**(TREE_DEPTH) - 1, 3), 'tree'),
InputDesc(tf.int32, (BATCH_SIZE, ), 'label')]
def _build_graph(self, inputs):
_, tree, label = inputs
level = TREE_DEPTH - 1
points = tree[:, 2**level - 1:2**(level + 1) - 1, :]
neighbor_hood = knn(points, k=K, subBatch=SUB_BATCH_SIZE)
features = tf.ones([BATCH_SIZE, NUM_POINTS, 16])
features = tf.transpose(features, [0, 2, 1])
neighbor_hood = tf.transpose(neighbor_hood, [0, 2, 1])
position = tf.transpose(points, [0, 2, 1])
features = FlexConv('conv_pre', features, neighbor_hood, position,
FEATURE_LEN, nl=ReLU)
for level in [9, 8, 7, 6, 5, 4, 3]:
w = 2**(level)
# sub-sampling step
features = tf.transpose(features, [0, 2, 1])
features = NeighborhoodSubsampling('subsampling_%i' % level, features)
features = tf.transpose(features, [0, 2, 1])
position = tree[:, 2**level - 1:2**(level + 1) - 1, :]
# _, neighbor_hood = ops.nano_flann(position, k=min(K, w))
neighbor_hood = knn(position, k=min(K, w), subBatch=SUB_BATCH_SIZE)
position = tf.transpose(position, [0, 2, 1])
neighbor_hood = tf.transpose(neighbor_hood, [0, 2, 1])
features = FlexConv('conv%i_0' % level, features, neighbor_hood, position,
FEATURE_LEN, nl=ReLU)
features = FlexConv('conv%i_1' % level, features, neighbor_hood, position,
FEATURE_LEN, nl=ReLU)
features = FlexConv('conv%i_2' % level, features, neighbor_hood, position,
FEATURE_LEN, nl=ReLU)
# not enought clusters in current level? --> break
if w <= K:
break
# fully connected
features_shape = features.get_shape().as_list()
pointcloud_dim = np.prod(features_shape[1:])
features = tf.reshape(features, [BATCH_SIZE, pointcloud_dim])
logits = FullyConnected('fc1', features, 40, nl=tf.identity)
# vanilla classification loss
cls_loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=label)
cls_loss = tf.reduce_mean(cls_loss, name="cls_costs")
accuracy = symbf.accuracy(logits, label, name='accuracy')
self.cost = tf.identity(cls_loss, name="total_costs")
summary.add_moving_summary(cls_loss, self.cost, accuracy)
That was the network we were using. Pretty straight-forward. @grohf please chime in, if there is another version you have chosen the hyper-parameters :-)
Thanks @PatWie . Sorry to both you again (I know CVPR is close! :S), but can I just confirm my understanding of a few things.
All good if that's the case, they just seem like interesting choices to me, so maybe I'm misunderstanding something...
@PatWie Sorry to be a pain, but I'm even more confused now. 2 additional questions:
FILTER_LEN ** 2 * Dp = 128 ** 2 * 4 = 65k
parameters (plus change from biases), and there's 7
blocks of 3
, doesn't that leave you with 1.4m
or so? Not including initial flex conv or final dense.Disclaimer: I should have stated this before: I haven't trained the models for ModelNet40 for the paper. I just helped to monitor the losses and the logs (basically baby-sit the training). I had trained a few models on ModelNet40 with slightly less accuracy before. (After all a dump classifier gets 84% accuracy).
An additional trick I learned was to split the number of channels into groups and applying flex-conv to each group separately helps:
[H,W,128]
--> 1x1 conv
--> split
--> different flex-conv [H,W,16] [H,W,16] [H,W,16] [H,W,16] [H,W,16] [H,W,16] [H,W,16] [H,W,16]
--> concat
--> 1x1 conv
This also reduces number of parameters from 128**2*3
to 16**2*3*8
plus some 1x1 kernels. This made it much faster.
Something nobody would write in the paper is that ModelNet40 is not helpful at all. Even with bugs we reached pretty high accuracy.
Are you aware of https://github.com/hkust-vgd/scanobjectnn ?
I totally agree ModelNet40 is... odd. I took solice in finding the supplementary material in this paper and seeing the flower pot vs plant quiz. I'll have a look at scanobjectnn - thanks!
On the point of pooling, isn't position = tree[:, 2**level - 1:2**(level + 1) - 1, :]
removing half of the points per iteration? And what is w
if not the cloud size? And how do you do 7 quaterings of 1024 points?
You are right I'm on time pressure for CVPR. Thus, sorry for answering late (and short). I can not stretch how much I dislike results ModelNet40 for a lot of reasons... But anyhow, I should add some notes :)
First, the model used in the paper was really dump duel-flex-conv on only 2 hierarchies and a convolution out of the center with a 2-fc-layer at the end. At that time we used the dataset provided by PointNet2 with 10k points. I got asked to redo the inference with multiple random sets and I need to admit that it varies quite a bit [~88%-92.5%]...
Later on in a student project, we re-sampled the original dataset in a way that we make sure to only sample on outer surfaces of objects. (There are a lot of objects with garbage inside...) Simple ResNeXt -Style Flex-conv w/o down sampling and simple max-pooling over the feature into FC is already enough to get into the same range (and some times even better....) That strengthens my assumption, that global features are more important in object classification on ModelNet40 than relative dependencies.
But to make that clear, in my opinion results above 90% on that Modelnet40 should be taken with a big grain of salt.
I hope you got some insights :)
@grohf I appreciate your time - and I agree, pushing the limits of accuracy on modelnet is not a pursuit I'm interested in. My work is looking at simplifying and downscaling models. My own CVPR submission is looking to reproduce this work as a baseline, so I want to do as fair a job as possible.
Could you ellaborate on what you mean by "dump duel-flex-conv"?
In terms of the rest of the training details, I'll be assuming the following, but please let me know if there's anything I should change.
Thanks again.
Hi, I'm trying to reproduce your modelnet results as accurately as possible and I'm hoping you can provide more details on the architecture used. Specifically
Apologies if this information is somewhere and I'm failing to find it...