Detections in own point clouds

fferroni commented 6 years ago

Hello,

What is the required orientation of the extracted point clouds? I tried running the PointNet on a different extracted frustum of points, but the results were strange.

I tried to visualize the point clouds in the pickled files, and it appears that the frustum direction is oriented along the z-axis. From this I can infer that the points are in camera coordinates, rather than lidar coordinates. Can you confirm this? Does the network only work like this?

I am using a batch size of 1, with 1024 points, and do something like this:

            class_vector = np.zeros((1, 3))
            class_vector[0, 1] = 1

            point_cloud = np.zeros((1, 1024, 4), dtype=np.float32)
            indices = np.arange(0, len(frustum))

            if len(frustum) > 1024:
                choice = np.random.choice(indices, size=1024, replace=True)
            else:
                choice = np.random.choice(indices, size=1024, replace=False)

            point_cloud[0] = frustum[choice]

I randomly sample 1024 from the extracted frustum (better ways?) and feed this along with the other two placeholders for class ID and training phase. I give the point cloud in camera coordinates. In the example here, I put it as a pedestrian: https://imgur.com/a/yWo0nn1

            feed_dict = {
                self.pointclouds_pl: point_cloud,
                self.one_hot_vec_pl: class_vector,
                self.is_training_pl: False,
            }

            batch_logits, batch_centers, \
            batch_heading_scores, batch_heading_residuals, \
            batch_size_scores, batch_size_residuals = \
                self.session.run([
                    self.logits, self.center,
                    self.end_points['heading_scores'], self.end_points['heading_residuals'],
                    self.end_points['size_scores'], self.end_points['size_residuals']],
                    feed_dict=feed_dict)

            batch_seg_prob = softmax(batch_logits)[:, :, 1]  # BxN
            batch_seg_mask = np.argmax(batch_logits, 2)  # BxN
            mask_mean_prob = np.sum(batch_seg_prob * batch_seg_mask, 1)  # B,
            mask_mean_prob = mask_mean_prob / np.sum(batch_seg_mask, 1)  # B,
            heading_prob = np.max(softmax(batch_heading_scores), 1)  # B
            size_prob = np.max(softmax(batch_size_scores), 1)  # B,
            batch_scores = np.log(mask_mean_prob) + np.log(heading_prob) + np.log(size_prob)

            filtered_frustums.append(point_cloud[batch_seg_mask == 1].astype(np.float3

However, when I tried to visualize the points belonging to the object, it doesn't look right... https://imgur.com/a/3MrEBGQ

Is there some further normalization that I need to do to the point cloud? I looked into the test.py but I can't really tell. Am I misinterpreting what the batch_seg_mask is?

Anyhow, great work!

charlesq34 commented 6 years ago

Hi @fferroni

Yes, we are working on the KITTI rectified camera coordinate system. If you want to test on another dataset, you need to verify that the axis directions are consistent with KITTI and make sure the camera height is similar to KITTI.

btw, the links provided seem not to work anymore.

fferroni commented 6 years ago

Hi @charlesq34, thanks for the fast reply. I fixed the camera height and used the same axes and now it works pretty well ;-)

Can you comment on sampling strategies for the frustum points? 1) In cases where there are fewer points than the nb_points parameter in the network, is it better to put them all to zero, or oversample the actual points? 2) In cases where there are substantially more points than the nb_points parameter, can you comment on optimal sampling strategies? I am doing randomly, but am wondering how you got the performance in the KITTI benchmark?

BR, Francesco

charlesq34 commented 6 years ago

Hi @fferroni

No problem -- glad it works. we'd better oversample the actual points by repeating the existing points (by randomly choices) -- putting them to zero introduces an un-wanted bias to the estimation.

random sampling is mostly fine (we used that for our evaluation) however it is surely not optimal. you are welcomed to try different ways to sample -- if you care more about faraway objects, maybe keep more points in distance

hope it helps

AramSo commented 6 years ago

@fferroni, Did you setup camera structure similar to kitti? Did you do the calibration(cam2lidar) separately?

@charlesq34 , When Detections in own point clouds using kitti's training data, Due to the difference in camera and lidar position, I do not think it will be estimated correctly. Right? So the only way is to use my data to train(include labelling) and detect? What do you think?

PranjalBiswas commented 6 years ago

Can I use point cloud obtained from a Kinect camera along with this algorithm for 3D Object Detection?

xhuan28 commented 5 years ago

Hi @fferroni , Did you fix the camera height to 1.65m as described in the KITTI paper? I also have the same problem you met before. The output of 3d bounding box is strange when I was using a realsense camera sitting on the desk.

gujiaqivadin commented 5 years ago

Hi @charlesq34, thanks for the fast reply. I fixed the camera height and used the same axes and now it works pretty well ;-)

Can you comment on sampling strategies for the frustum points?

In cases where there are fewer points than the nb_points parameter in the network, is it better to put them all to zero, or oversample the actual points?

In cases where there are substantially more points than the nb_points parameter, can you comment on optimal sampling strategies? I am doing randomly, but am wondering how you got the performance in the KITTI benchmark?

BR, Francesco

Hello, thanks for your idea of sampling. I also found that random sampling may not be the best sample strategy for point cloud with more points that KITTI. Can you recommend some sample strategy for dense point cloud? I am working on how to make it better.

charlesq34 / frustum-pointnets

Detections in own point clouds #23