WeijingShi / Point-GNN

Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud, CVPR 2020.
MIT License
540 stars 112 forks source link

Your network almost never outputs the right rotation values #37

Closed sarimmehdi closed 4 years ago

sarimmehdi commented 4 years ago

Hello. I notice that your neural net always outputs a positive angle value and rarely gives the negative angle value. Is there any reason for your neural net not being able to regress to the right angle value?

WeijingShi commented 4 years ago

Hi @sarimmehdi, Do you mean the yaw angle? The yaw direction is predicted in a range of [-pi/4, 3pi/4] as they are sufficient to generate 3D bounding boxes. All yaw angles are shifted to that range by adding/subtracting pi. Inside the network, we treat the side-view object [-pi/4, pi/4] and front/back-view object [pi/4, 3pi/4] as two separate classes. If you want to predict yaw angle in [-pi/4, 7pi/4] range for your application, you can add two more prediction headers for [3pi/, 5pi] and [5pi, 7pi]. Hope this is helpful. Thanks,

sarimmehdi commented 4 years ago

Hi @WeijingShi Can you tell me where I need to make the changes (the name of the file where I need to make the change) so that the yaw is output to be between -pi and pi (as is the case in the official KITTI labels). Thanks

WeijingShi commented 4 years ago

Hi @sarimmehdi , The bounding box labels are split and assigned to each point here: https://github.com/WeijingShi/Point-GNN/blob/2baf24f9556907f23e2e4018f1b756dac3f6c497/dataset/kitti_dataset.py#L1184 Then each point's label is encoded here: https://github.com/WeijingShi/Point-GNN/blob/2baf24f9556907f23e2e4018f1b756dac3f6c497/models/box_encoding.py#L231 You can add a new split method which outputs [-pi/4, pi/4], [pi/4, 3pi/4], [3pi/4, 5pi/4], [5pi/4, 7pi/4] four subclasses. And center their yaw value during encoding by subtracting 0, pi/2, pi, 3pi/2. (Just one choice, you can try your preferred method). And use them in training and evaluation:
https://github.com/WeijingShi/Point-GNN/blob/2baf24f9556907f23e2e4018f1b756dac3f6c497/train.py#L70 https://github.com/WeijingShi/Point-GNN/blob/2baf24f9556907f23e2e4018f1b756dac3f6c497/train.py#L113

sarimmehdi commented 4 years ago

Hello. Sorry for the late reply. I am unable to follow your instructions. Can you show me what you mean through a simple code segment that shows where I need to make the change? Particularly, I wish to know where to add this new split method you just mentioned as I see no mention of angle values

EDIT: So, I obtained the angle values as they were from your neural net. Now, exactly what angle value should I add to the predicted rotation value to bring it from your custom range to the official KITTI range which is between pi and -pi.

WeijingShi commented 4 years ago

Hi @sarimmehdi The variable "yaw" in the code is the bounding box angle that you are looking for. To predict yaw value at [0, 2pi], you need to modify the code and retrain the network. One way to do that is to have four prediction headers, which handle objects with yaw angle in [-pi/4, pi/4], [pi/4, 3pi/4], [3pi/4, 5pi/4], [5pi/4, 7pi/4] respectively.

There are two main modifications that you need. The first is to change the training labels. In the following assign_classaware_car_label_to_points method, bounding boxes are separated into [-pi/4, pi/4] and [pi/4, 3pi/4] two categories. You need to change it to [-pi/4, pi/4], [pi/4, 3pi/4], [3pi/4, 5pi/4], [5pi/4, 7pi/4] four categories. https://github.com/WeijingShi/Point-GNN/blob/2baf24f9556907f23e2e4018f1b756dac3f6c497/dataset/kitti_dataset.py#L1184

The second modification is on the encoding method. In the following classaware_all_class_box_encoding method, training labels are encoded. Specifically, the yaw angles in each category are centered. As you now have four yaw categories [-pi/4, pi/4], [pi/4, 3pi/4], [3pi/4, 5pi/4], [5pi/4, 7pi/4], you need to subtract 0, pi/2, pi, 3pi/2 respectively. https://github.com/WeijingShi/Point-GNN/blob/2baf24f9556907f23e2e4018f1b756dac3f6c497/models/box_encoding.py#L231 The decoding method needs to be changed accordingly. https://github.com/WeijingShi/Point-GNN/blob/2baf24f9556907f23e2e4018f1b756dac3f6c497/models/box_encoding.py#L265

If you add new methods instead of modifying the existing ones, make sure you are using the new methods in the train.py and run.py.

After retraining the network, if you need [-pi, pi], you can simply np.mod(yaw+pi, 2pi)-pi.

sarimmehdi commented 4 years ago

Is there any reason why you are not regressing to angles between -pi and pi? This probably has a bad effect on your accuracy when you submit to the kitti leaderboard

WeijingShi commented 4 years ago

Hi @sarimmehdi, The mAP computes overlap area between the predicted bounding box and ground truth bounding box. Changing a box's yaw angle to yaw-pi would not affect its overlapping. Therefore, to make network simple, we just use two headers [-0.25pi, 0.25pi] and [0.25pi, 0.75pi] instead of four headers.

sarimmehdi commented 4 years ago

Since you are using your own orientation values, can you provide a script that allows us to draw the bounding boxes in the right way on the image plane? This would be much more useful than trying to retrain the entire network from scratch so that it can output angle values in the correct range.

As of now, this is the script I use to draw 3D bounding boxes and it requires angles in the range of -pi and pi:

def plot_3d_bbox(img, calib, bbox3d_center, bbox3d_dims, bbox3d_roty):
    box_3d = []
    box_pts = []
    h, w, l = bbox3d_dims
    p0, p1, p2, p3 = np.array([l/2,0,w/2]), np.array([-l/2,0,w/2]), 
                     np.array([-l/2,0,-w/2]), np.array([l/2,0,-w/2])
    p4, p5, p6, p7 = np.array([l/2,-h,w/2]), np.array([-l/2,-h,w/2]), 
                     np.array([-l/2,-h,-w/2]), np.array([l/2,-h,-w/2])
    pts_array = np.array([p0, p1, p2, p3, p4, p5, p6, p7]).transpose()
    rot_mat = np.array([[cos(bbox3d_roty), 0, sin(bbox3d_roty)],[0, 1, 0],[-sin(bbox3d_roty), 0, cos(bbox3d_roty)]])
    pts_array = np.matmul(rot_mat, pts_array).transpose()
    for pt_array in pts_array:
        box_pts.append(np.append(pt_array+bbox3d_center, 1))
        box_3d.append(get_img_pt(np.append(pt_array+bbox3d_center, 1), calib))
    for i in [0,1,2,3]:
        pt1, pt2, pt3, pt4 = box_3d[i%4], box_3d[(i+1)%4], box_3d[(i+4)%8], box_3d[(i+5)%8]
        pt5, pt6 = box_3d[(i%4)+4], box_3d[((i+1)%4)+4]
        cv2.line(img, pt1, pt2, (0, 0, 255), 1)
        cv2.line(img, pt1, pt3, (0, 0, 255), 1)
        cv2.line(img, pt2, pt4, (0, 0, 255), 1)
        cv2.line(img, pt5, pt6, (0, 0, 255), 1)
    # draw two intersecting lines on the front-face of the 3d bbox
    cv2.line(img, box_3d[0], box_3d[-1], (0, 0, 255), 1)
    cv2.line(img, box_3d[3], box_3d[4], (0, 0, 255), 1)
    center_pt_img = get_img_pt(np.append(bbox3d_center, 1), calib)
    cv2.circle(img, center_pt_img, 3, (255, 255, 255), -1)

    return box_pts

def get_img_pt(pt, calib):
    projected_point = np.dot(calib['P2'], pt)
    projected_point = projected_point[:2] / projected_point[2]
    projected_point = projected_point.astype(np.int16)
    return (projected_point[0], projected_point[1])

Maybe you can suggest to me what changes I need to make to draw bounding boxes so that they face in the right direction according to your angles?

WeijingShi commented 4 years ago

If you want the correct direction, you need a yaw angle within [0, 2pi]. The pre-trained model is outputting half the range ([-0.25pi, 0.75pi]). Drawing using the output would flip the direction. You can change the network output to [0, 2pi] by doubling the predict header and retrain as we discussed.