How to get the quantitative results as in the paper?

Hi,

Thanks again for publishing your work!

I am trying to reproduce the camera and object tracking results in Table I and Table II in the RAL paper. For camera tracking of manipulation and rotation sequence, I have used "-init tf -init_frame camera_true" as arguments, and use the logged poses-0.txt as ground truth camera poses. Then I evaluate ATE using the evaluate_ate.py provided by TUM-RGBD dataset. I got comparable results as in the paper.

But for the object tracking, it seems that I can not use -init tf to get the ground truth pose. How should I evaluate the object tracking trajectory quantitively? Could you provide some more details on the steps for getting results in the tables? Thanks!

For camera tracking of manipulation and rotation sequence, I have used "-init tf -init_frame camera_true" as arguments, and use the logged poses-0.txt as ground truth camera poses.

Since the bag files already store the ground truth camera trajectory captured from the Vicon system, you can run with the options -init tf -init_frame camera_true, where camera_true is the true camera frame, and export the true camera trajectory.

Then I evaluate ATE using the evaluate_ate.py provided by TUM-RGBD dataset. I got comparable results as in the paper.

That's good to hear. Replaying the bag files in parallel should give you similar results. However, due to the non-determinism of having the producer (bag file) and consumer (ROS node) in separate processes, the results may vary slightly. Also, we do not process all 30 frames per second and the results may change with faster/slower processing time.

But for the object tracking, it seems that I can not use -init tf to get the ground truth pose. How should I evaluate the object tracking trajectory quantitively? Could you provide some more details on the steps for getting results in the tables?

The -init and -init_frame options only work for the camera trajectory. We cannot spawn the true objects, since there is no ground truth segmentation.

The true trajectory of the object path on the conveyor belt is recorded separately with Vicon markers. The raw bag files do not contain the true Vicon tracking of the objects since the markers would interfere with the visual tracking. Can you check, if you can reproduce the results with these true-1.txt files: mmf_object_path.zip? The archive contains folders with the sequence names with two subfolders icp (Co-Fusion baseline) and flow_crf (proposed method) each. The icp folder does not contain a true-1.txt when the detection or tracking failed.

As with the camera trajectory, you can use the evaluate_ate.py script from the rgbd_benchmark_tools SVN (https://cvg.cit.tum.de/data/datasets/rgbd-dataset/tools) and compare with the exported poses-1.txt.

Thanks for your prompt reply.

Following your advice, I have estimated the ATE and RPE of the two estimation bag files, and the ATE of object tracking for the segmentation bag files. For a deterministic behavior, I read from bag files frame-by-frame using the “-l” arguments. I didn’t get same results, but the trend is clear, in terms of performance, MMF(S+D)>MMF(S)>CF. MMF_result.pdf are the results. I have several questions about it.

I constantly get a smaller error compared to results in the paper. I think this might be caused by different icpStepMap, rgbStepMap, rgbResMap, so3StepMap settings in the GPUConfig.h. In theory, these values should not effect the tracking accuracy, but according to my experiments, these values have a large impact on the final results. This might be a bug of CoFusion. I’m using below arguments, could you share the arguments that you use?
1. Besides the influence of the step parameters, I would also like to make sure that my evaluation protocol is correct. I used trans_estim.sh and motion_seg.sh to generate estimated poses. Then used evaluate_trans_estim.sh and evaluate_motion_segm.sh to get the ATE and RPE. To correctly associate the timestamps, I converted the int64 timestamp into a float, e.g. 1633363383413638942 to 1633363383.413638942. Could you maybe have a look at the scripts to see if anything went wrong? scripts.zip
2. I have always used the true-1.txt in bag_name/flow_crf as ground truth. It’s a bit confusing for me why there are two folders flow_crf and icp for each bag file. Could you explain the difference?
3. Are the MMF results in Table II with “-redetection” enabled or not?
4. How are the estimated poses obtained for the paper? Using MMF as ROS node or reading from bag file?
5. The published dataset features moving camera only and moving objects only. Have you recorded a bag file with both camera and object moving? If so, is it possible to share that file?

Thanks in advance for your help! It’s really appreciated.

I constantly get a smaller error compared to results in the paper. I think this might be caused by different icpStepMap, rgbStepMap, rgbResMap, so3StepMap settings in the GPUConfig.h. In theory, these values should not effect the tracking accuracy, but according to my experiments, these values have a large impact on the final results. This might be a bug of CoFusion. I’m using below arguments, could you share the arguments that you use?

I was using an "Intel Core i9-9900KF" and a "Nvidia GeForce RTX 2080 SUPER" and did not optimise the numThreads, numBlocks, etc. I was using the defaults. Effectively, the ROS node was publishing the tracking results at 8 Hz while the "Azure Kinect DK" could do 30 Hz. If you use a faster CPU and GPU, then the live realtime processing (e.g. from a bag file) should provide better results.

You should get similar performance results across different GPUs when you run in the deterministic frame-by-frame mode.

Besides the influence of the step parameters, I would also like to make sure that my evaluation protocol is correct. I used trans_estim.sh and motion_seg.sh to generate estimated poses. Then used evaluate_trans_estim.sh and evaluate_motion_segm.sh to get the ATE and RPE. To correctly associate the timestamps, I converted the int64 timestamp into a float, e.g. 1633363383413638942 to 1633363383.413638942. Could you maybe have a look at the scripts to see if anything went wrong? scripts.zip

Just skiming over the scripts, the parameters provided to MultiMotionFusion look correct.

Fore reference, the parameters I used for the camera pose estimation experiments were:

# CoFusion baseline (ICP)
MultiMotionFusion \
    -ros -dim 640x480 \
    colour:=/rgb/image_raw \
    depth:=/depth_to_rgb/image_raw/filtered \
    camera_info:=/rgb/camera_info \
    _image_transport:=compressed \
    -run -q \
    -em -ep \
    -exportdir /tmp/eval/$logname/icp

# MMF with keypoint tracking only (no refinement)
MultiMotionFusion \
    -ros -dim 640x480 \
    colour:=/rgb/image_raw \
    depth:=/depth_to_rgb/image_raw/filtered \
    camera_info:=/rgb/camera_info \
    _image_transport:=compressed \
    -run -q \
    -em -ep \
    -exportdir /tmp/eval/$logname/norefine \
    -model [workspace]/install/super_point_inference/share/weights/SuperPointNet.pt \
    -init kp

# MMF with keypoint tracking and dense refinement
MultiMotionFusion \
    -ros -dim 640x480 \
    colour:=/rgb/image_raw \
    depth:=/depth_to_rgb/image_raw/filtered \
    camera_info:=/rgb/camera_info \
    _image_transport:=compressed \
    -run -q \
    -em -ep \
    -exportdir /tmp/eval/$logname/kpinit \
    -model [workspace]/install/super_point_inference/share/weights/SuperPointNet.pt \
    -init kp -icp_refine

And for the evaluation, I used something like

ATE=`${tool_path}/evaluate_ate.py --max_difference=20000000 "${eval_path}/${SEQ}/true/poses-0.txt" "${eval_path}/${SEQ}/${METH}"`
RPE=`${tool_path}/evaluate_rpe.py "${eval_path}/${SEQ}/true/poses-0.txt" "${eval_path}/${SEQ}/${METH}"`
echo "ATE RMSE: ${ATE} m"
echo "RPE RMSE: ${RPE}"

I have always used the true-1.txt in bag_name/flow_crf as ground truth. It’s a bit confusing for me why there are two folders flow_crf and icp for each bag file. Could you explain the difference?

I think this was done because the original true trajectory was centred such that it starts at the position where the object is spawned by the segmentation method. If you plot the trajectories from both folders in a sequence and centre their starting points, I think you should get the same trajectory.

Are the MMF results in Table II with “-redetection” enabled or not?

They are without redetection. That means that the tracking would create a new mode with a new ID once the method would loose track.

How are the estimated poses obtained for the paper? Using MMF as ROS node or reading from bag file?

All the evaluations in the Tables were done in the live / realtime mode when the bag file was played back in parallel to running the ROS node.

The published dataset features moving camera only and moving objects only. Have you recorded a bag file with both camera and object moving? If so, is it possible to share that file?

As far as I can remember, I did not specifically record the "redetection and pick & place" experiment when the robot was pick & placing the object from the conveyer belt to the table (Section IV.D and Figure 9).

christian-rauch / MultiMotionFusion

How to get the quantitative results as in the paper? #6