NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.03k stars 287 forks source link

getting pose from test data without going through ROS #76

Open trevoravant opened 5 years ago

trevoravant commented 5 years ago

I have some test images (and associated camera intrinsics) that I would like to get pose estimates from. I'm hoping there's a way to do this in Python only (i.e. without ROS). I know you have provided the train.py script which has a --dataset option, so I figured I might be able to modify that. But in that file, it seems like only belief maps and output affinities come out of the network, and not the pose itself. Reading through your paper, it says that there is a post-processing step in which the belief maps are converted to pose.

I've tried looking through the ROS and detector.py code, but haven't been able to figure out how to get the pose from a belief map. Is there an easy way? If it's simple enough, do you think you could provide some sample code?

Thanks

Abdul-Mukit commented 5 years ago

Hi @trevoravant , I also needed that. I just used the functions provided by the authors to implement that without using ros. I used both a webcam and an Intel RealSense camera. I have my codes here. You can try that out if you want.

For the webcam, I use "live_dope_webcam.py".

TontonTremblay commented 5 years ago

I have code to do that, I will answer this thread this week with an example if that is ok with you.

trevoravant commented 5 years ago

@Abdul-Mukit Thank you so much! That code is super helpful!

@TontonTremblay Thank you, that would be great!

TontonTremblay commented 5 years ago

In the main train loop, you have to treat frame by frame:

        config_detect = lambda: None
        config_detect.mask_edges = 1
        config_detect.mask_faces = 1
        config_detect.vertex = 1
        config_detect.treshold = 0.5
        config_detect.softmax = 1000
        config_detect.thresh_angle = 0.5
        config_detect.thresh_map = 0.01
        config_detect.sigma = 2
        config_detect.thresh_points = 0.1

        for i_batch in range(opt.testbatchsize):
                rotations     = np.array(targets['rot_quaternions'][i_batch])
                translations  = np.array(targets['translations'][i_batch])
                matrix_camera = np.array(targets['matrix_camera'][i_batch])
                cuboid        = Cuboid3d(np.array(targets['cuboid'][i_batch]))
                filename      = targets['file_name'][i_batch]
                pointsBelief  = targets['pointsBelief'][i_batch].numpy()

                pnp_solver = CuboidPNPSolver(filename,matrix_camera,cuboid)
                detected_objects = ObjectDetector.find_object_poses(
                    output_belief[-1][0], output_affinities[-1][0], pnp_solver, config_detect)

                 add = ADDErrorCuboid(
                            pose_gu = GetPoseMatrix(detected_objects[0]['location'],detected_objects[0][
'quaternion']),
                            pose_gt = GetPoseMatrix(translations[0],rotations[0]),
                            cuboid  = cuboid
                        )

This assumes the dataloader returns back the information mentioned there, moreover this assumes only one object in the map. You can use FAT single (which I used to use).

Here is the ADD function (I cheated a little bit, instead of loading the full 3d model, for quick testing I am only looking at the cuboid error.

    def ADD_error_cuboid(self, pred_pose, actual_pose, cuboid):
        #obj = self.__obj_model
        vertices = np.array(cuboid._vertices)
        # print (vertices.shape)
        vertices = np.insert(vertices,3,1,axis=1)
        vertices = np.rot90(vertices,3)
        # print(vertices)

        #obj_n = self.__obj_model_n
        obj = vertices
        pred_obj = np.matmul(pred_pose, obj)

        # obj = self.__obj_model
        # pred_obj = np.matmul(pred_pose, obj)
        # print (pred_obj)

        actual_obj = np.matmul(actual_pose, obj)
        #actual_obj = np.matmul(self.LtNT, actual_obj_l)
        #print("PREDICTED OBJECT\n", pred_obj)
        #print("ACTUAL OBJECT\n", actual_obj)   
        dist = spatial.distance.cdist(pred_obj.T, actual_obj.T, 'euclidean')
        true_dist = [dist[i][i] for i in range(len(dist))]
        #for i in range(len(true_dist)):
        #    if true_dist[i] >6000:
        #        print(i, true_dist[i])
        # print (true_dist)
        # raise()
        return np.mean(true_dist)

The function for getting the matrix:

def GetPoseMatrix(location,rotation):
    """
        Return the rotation Matrix from a vector translation 
        and a quaternion rotation vector

    """
    from pyquaternion import Quaternion 

    pose_matrix = np.zeros([4,4])
    q = Quaternion(x=rotation[0], y=rotation[1], z=rotation[2], w=rotation[3])

    pose_matrix[0:3,0:3] = q.rotation_matrix
    pose_matrix[0:3,3] = np.array(location)
    pose_matrix[3,3] = 1

    return pose_matrix

I think this is the main idea. Obviously this will not work right away as you need to modify the data loader and probably have to make sure that the data flows correctly.

trevoravant commented 5 years ago

I tried to implement the suggestions provided by @TontonTremblay but I didn't have time to finish. But I figured I'd provide some notes on my progress, which I'm thinking may be helpful to someone else who tries. As suggested, I was working with the Deep_Object_Pose/scripts/train.py file. Please correct me if I'm wrong anywhere.

I didn't need to get pose from test data while training, so I ended up using the sample code that @Abdul-Mukit provided. This code only needed to be modified slightly for my purposes, and worked well for me.

cianohagan commented 4 years ago

Hey @TontonTremblay, thanks for the code snippets, they're very helpful.

I am attempting to evaluate predictions on my test set of a custom object by calculating the ADD accuracy metric that is reported in your paper.

Is the ADD_error_cuboid function above returning the average 3D euclidean distance between the keypoints of the ground truth cuboid vs the keypoints of the predicted cuboid in meters? The values that I am obtaining when executing this function are larger than one would expect if so.

TontonTremblay commented 4 years ago

Yeah what you described is correct. But you might need to debug it a little bit. The cuboid creation function might be a little off for your object. There might be some cm vs mm vs m creeping up on you as well when applying the gt transform vs the one given from pnp.

cianohagan commented 4 years ago

Which gt transform are you referring to here?

If we are getting the distance between the two arrays, would it not be in terms of cm? The ADD calculation is the distance between the projected and actual arrays that are created with the cuboid._vertices list which is the list of the 8 vertices of the cuboid.

These vertices are calculated with the cuboid.size3d list which is obtained by reading the object dimensions from _object_settings. Are these dimensions not in cm? They appear to be in cm when examining the Falling Things dataset.

Thanks again for your help with this.

cianohagan commented 4 years ago

For context, my mean ADD Accuracy is 19.35 and median is 17.331 for my test set where the predictions appear to be visually quite accurate

TontonTremblay commented 4 years ago

@cianohagan If the ADD you shares is in mm, I would say this is expected, around 2cm is what we observed in our robotics experiments.

IbrahimMCode commented 4 years ago

Hello,

What kind of changes should we do to the data loader?

TontonTremblay commented 4 years ago

You want the pose from the data loader directly. Just check load_json function, get translation and rotation_wxyz from it, this is the pose decomposed in translation and rotation. Use pyquaternion to get the matrix translation from it. I hope this helps.

IbrahimMCode commented 4 years ago

In order to test the mustard model I'm using the scripts you provided above. I'm actually testing a group of images and saving the output of the location(before converting to meters since the ground truth labels are in cm) and the orientation, and comparing them to the ground truth location and quaternion_xyzw from the json files(FAT Dataset) because I've seen in the load_json that you're loading the location and appending to the translations same for the quaternion and rotations, I also used pyquaternion to get the matrix translation for both. (Please correct if I did a wrong step).

After testing I'm getting an unexpected result which is around 30cm while the predictions bounding boxes are visually accurate.

TontonTremblay commented 4 years ago

This sounds like the right way to do it. Can you visualize your pose vs the gt pose you are using. You can do that with cv2.reproject something, check the code to see how to use it. When you create your py quaternion, make sure to pass x= , y= , z= , w= . We had issues with the pyquaternion constructor. So 2 things, 1) check how you are building the py quaternion. 2) reproject the poses you are using to see if they are similar to the one outputted by DOPE.

IbrahimMCode commented 4 years ago

Thank you very much for your help! Is it good to have an average of 5cm for the mustard object?

TontonTremblay commented 4 years ago

I would not be surprised, but I dont have the data right now to look into it. Do you have the ADD curves?

IbrahimMCode commented 4 years ago

Yes I got them from the paper and I have now these results https://drive.google.com/file/d/1eeqekChGLHAUDtqLRT5CfVpxGW2f2MBW/view?usp=sharing These results are for the mustard object testing using the pre-trained weights for 60 epochs and the testing images are only Photorealistic images from FAT dataset. I think it should look similar to the one in the paper if I test more images.

You already mentioned before that instead of loading the full 3d model, for quick testing you are only looking at the cuboid error.

I just have two questions, do you mean by full 3d model the full object points same as PoseCNN? Is it possible using dope?

Thank you!

TontonTremblay commented 4 years ago

If you only trained on FAT dataset, your results are going to be not as good as missing with DR data. For the paper we use the PoseCNN models. Yu found the transform from our model to his. I hope this helps.

IbrahimMCode commented 4 years ago

The results here are not for my training, I used for testing the files you provided for each object with 60 epochs. Sorry for bothering you but you didn't answer my question about what you mean by full 3d model.

Thank you

Abduoit commented 4 years ago

Hello @cianohagan I am trying to calculate the ADD metric, can anyone please provide the code or show the steps to do that?

TontonTremblay commented 4 years ago

You can check ycb-videos toolkit, there is an implementation in there.