christianrauch / apriltag_ros

ROS2 node for AprilTag detection
MIT License
128 stars 85 forks source link

Inaccurate Pose/TF Calculation #14

Closed a-uhlig closed 5 months ago

a-uhlig commented 1 year ago

While converting my system to ROS2 I was glad to find this port, thanks.

However, I noticed that the pose/TF accuracy is not as good as in ROS1. After digging into the code, I saw that a different approach is used here to calculate the pose from the detections. I have tried the solution with pnp-solver (as in the ROS1) and the suggested library function (https://github.com/AprilRobotics/apriltag/wiki/AprilTag-User-Guide#pose-estimation). Both perform much better for my task. See small example (moving the camera on a smooth path).

I am curious if anyone else has experienced this issue, and if there is a particular reason for using the current method?

example

christianrauch commented 1 year ago

The issue was already reported by @MihailV1989 in https://github.com/christianrauch/apriltag_ros/issues/11#issuecomment-1310957374, which also compared the current initial implementation with the one from the apriltag library.

Thanks for doing these comparisons (to both of you). But I am not sure how to read your plot. Are you plotting the (x,y)-position of the marker in some frame over time? What about the z-coordinate and rotation? Do you know what the true tag pose is?

If you put a PR together and show that the new implementation is better than the old one, e.g. by comparison with the depth of an RGB-D camera or my ground-truth measurement, then I will be happy to review and likely merge it. I would also be interested in the runtime comparison between both implementations, as you may want to give the user the choice between throughput and accuracy. I think @MihailV1989 already wanted to submit a PR. I am not sure what the status of this is. Maybe you can work on this together?

a-uhlig commented 1 year ago

The issue was already reported by @MihailV1989 in #11 (comment), which also compared the current initial implementation with the one from the apriltag library.

Aha thanks, that is the same problem I have. I have not noticed that because of the mixed topics/title.

Thanks for doing these comparisons (to both of you). But I am not sure how to read your plot. Are you plotting the (x,y)-position of the marker in some frame over time? What about the z-coordinate and rotation? Do you know what the true tag pose is?

Sorry my plot is not ideal / not well explained. Yes, it is the x and y Position in the camera Frame over time. I omitted Z and the rotation for simpler plot (my application is basically 2D, I have a top view camera over a flat ground).

Providing a ground-truth is not so easy, I'll have to think again. Which ground-truth measurement do you use/suggest?

If you put a PR together and show that the new implementation is better than the old one, e.g. by comparison with the depth of an RGB-D camera or my ground-truth measurement, then I will be happy to review and likely merge it. I would also be interested in the runtime comparison between both implementations, as you may want to give the user the choice between throughput and accuracy. I think @MihailV1989 already wanted to submit a PR. I am not sure what the status of this is. Maybe you can work on this together?

Yes, I guess the speed is lower. But I have not measured this yet. May I can do this later.

christianrauch commented 1 year ago

Which ground-truth measurement do you use/suggest?

I think the simplest is to just keep the camera pose fixed and move the tag around on a table and measure its relative displacement in x and y and the rotation. E.g. move it by 5cm and then check if the pose changes by 5cm. Keep in mind that the origin of those poses is the optical frame. It will therefore be easier if you move the tag axes-parallel to the image plane.

christianrauch commented 1 year ago

I started looking into this myself and implemented now all three methods using the same interface for comparison. @a-uhlig Could you share your log file (rosbag) to reproduce your results?

a-uhlig commented 1 year ago

I am glad to hear that there are news here. Sorry for my late reply. I don't have enough time to deal with the April tags right now. I had created my example with a simulation but only recorded the final results. I will try to deal with it again soon.

christianrauch commented 1 year ago

I had created my example with a simulation but only recorded the final results.

Great. That means we can compare the results with the ground truth. We could also use the log file for CI to make sure the performance does not degrade.

I started work on this on the pose branch: https://github.com/christianrauch/apriltag_ros/tree/pose

The node now has a parameter pose_method that can be set to either of these:

I would be interested in the comparison of estimation performance vs. runtime. If there is a clear winner (best performance and lowest runtime), I would simply use this as the estimation method. If there is a trade-off between some of the implementations, I would keep the method configurable.

TomGrimwood commented 1 year ago

Hi Christian,

I am interested in testing your pose branch, can you provide any more info on its use? Is it functional at the moment? Do you know anything about how https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_apriltag determine the orientation? The orientation from isaac_ros_apriltag I have found to be good. however, its april tag identification is much worse.

christianrauch commented 1 year ago

I am interested in testing your pose branch, can you provide any more info on its use?

It's described in the comment above yours (https://github.com/christianrauch/apriltag_ros/issues/14#issuecomment-1548265708): You use it as described in the repo's readme file, after checking out the pose branch. You have to set the ROS parameter pose_method to one of the 4 options I listed above.

KatoROSTeam commented 9 months ago

What is the current status for pose accuracy, is it fixed?

christianrauch commented 9 months ago

What is the current status for pose accuracy, is it fixed?

Can you specify which problem you have? Did you try any of the alternative methods from the pose branch? Do they fix the problem for you?

KatoROSTeam commented 9 months ago

I am having accuracy problem in current implementation detection. Offset in 5cm and it's getting scale up when the tag is in long range (as range increase, offset increasing).

yes i tried alternative methods but the detection TF is going somewhere far away from actual position or going into the ground.

no those weren't fixed my problem.

christianrauch commented 8 months ago

I am having accuracy problem in current implementation detection. Offset in 5cm and it's getting scale up when the tag is in long range (as range increase, offset increasing).

Can you provide details on how you compare and get to this error value?

yes i tried alternative methods but the detection TF is going somewhere far away from actual position or going into the ground.

The original issue comment at the top is about comparing the pose estimation of the current implementation (setting pose_method to from_homography) with the one used in the apriltag library (function estimate_tag_pose) that can be used by setting pose_method to apriltag_orthogonal_iteration. If neither of the 4 options improves your results, then I am not sure what we are supposed to do.

What results do you get when using the apriltag library (https://github.com/AprilRobotics/apriltag) without the ROS node?

KatoROSTeam commented 8 months ago

I'll check with the library alone without ROS node and let you know. Thanks for the update.

christianrauch commented 7 months ago

Did any of you manage to test the different pose estimation methods on the pose branch and can report some results? Could you also share log files with sequences where you experienced the issues?

lennartalff commented 7 months ago

Hey @christianrauch,

we still have to migrate our apriltag workflow to ROS2. We greatly appreciate the effort you put into this package. We can't promise anything yet, but if we can manage to spend some time on it, we might collect some data with our gantry system for the different methods you have made available in the pose branch.

That is, if no one else has done this already and is willing to share the results?

lennartalff commented 7 months ago

Evaluation Setup

I have collected some data using a ignition gazebo simulation depicted below. apriltag_gazebo The robot has as downward looking camera and the model was declared static to keep it in a defined position during data collection.

An example image of the images capture by the camera is shown below. apriltag_camera_image instead of moving the camera/robot or an apriltag around, multiple tags in the same image are considered. This way, the result for multiple different tag position can be evaluated for a single image.

The camera is configured without distortion. Hence, the image can be considered as perfectly rectified and no errors due to distortion/rectification effects are induced.

The true tag poses and the pose of the camera are known. Thus, the pose estimations published under the /tf topic can be compared to the ground truth data. The position error is defined as the euclidean distance between the estimated tag position and the true tag position. The angle error is defined as the angle between the true tag's normal vector and the estimated normal vector computed by the apriltag node.

Zero mean gaussian noise with a std deviance of 0.1 was added to the camera. The experiments were conducted with three different orthogonal distances to the tags: 0.5m, 1.0m and 1.5m. This way the four pose estimation methods for different pixel sizes in the image can be compared. The edge length of the tags is 0.075m.

The computation time was measured by calling std::chrono::high_resolution_clock::now() before and after calling the pose estimation function.

For each orthogonal distance to the tags, N=1000 pose estimation have been evaluated. Since there are not 1000 tags visible in a single image, the simulated camera stream is evaluated until the total count of detections equals the desired number of pose estimations. Note that the relative pose between tags and camera stays the same over the series of evaluated camera images. The only difference between successive images is the gaussian noise applied to the simulated camera. Hence, the 'robustness' against noise will be reflected in the results.

The results are displayed as histograms. The respective x-axis displays the evaluated quantity (angle error, position error, computation time). Each y-axis represents how often (in absolute terms, with a sum of N=1000) the measured quantity occured in the results.

Name Function/Parameter
Method A "from_homography"
Method B "apriltag_homography"
Method C "apriltag_orthogonal_iteration"
Method D "solve_pnp"

Angle Error Plot

Note the different x-axis scale for each row! We observe similar performance for Method A and Method B. This is most likely explained by the fact that both methods implement the pose estimation via homography and should perform identical in general. For the larger camera-to-tag-distance in the second and third row, Method C and Method D show superior results. For the orthogonal camera-to-tag distance of 1.5m in the third row, the mean angle error of Method D is only half of Method C's mean angle error. Performance wise, the methods can be ranked as follows: D > C > B = A angle_error_plot

Position Error Plot

Again, note the differnce in the x-axis scale between the different rows. For the position error, to put it in a nutshell, C and D perform quite similar and seem to deliver the best results. Method A and B are not quite on par regarding the position error. The ranking: D = C > B > A position_error_plot

Computation Time Plot

Everything comes at a cost. The costs regarding the computation times are depicted in the image below. There are quite substiantial differences between the methods. By far the fasted method is B, followed by A with an average computation time of two times that of B. The computation time for D is roughly four times higher than of B and by far the slowest is C. The ranking: B > A > D > C computation_time_plot

Conclusion

Method C and D seem to give more precise results but require more computation time. Since D gives better or equally good results as C while having substantially lower computation times, Method D seems to be superior to Method C. The same can be said for Method A and B. Method B seems to give equally accurate or better results regarding the pose estimation while having lower computation costs. Hence, B seems to be superior to A.

Note that the presented simulation setup used for the evaluation might differ severely from real world examples. Still, it might be sufficient to give a hint on performance tradeoffs. Method A and B are fairly quick, while C and D seem to be more accurate.

I personally prefer Method D, since accuracy is more important than computation time for my application. Besides, I assume the apriltag detection algorithm requires the heavy lifting in contrast to the pose estimation. I would like to see Method D (solve_pnp) merged into the main branch. Since Method B and C are part of the apriltag library: what would be a reason against having the pose estimation method configurable via a parameter as in the pose branch?

If it is desired, that the apriltag node implementation stays rather lean, I would tend to have at least Method B and D available. This would align with the suggestion of @christianrauch in https://github.com/christianrauch/apriltag_ros/issues/14#issuecomment-1548265708

christianrauch commented 7 months ago

@lennartalff Thanks a lot for this evaluation. Regarding the plots, I see that the x-axis is the quantity you measure, but how should I interpret the y-axis? Is this a histogram over the all tags in the image? This doesn't seem to be normalised but the "count" values also seem quite high for the approx. 91 tags in the image.

When you say

Since C gives better or equally good results as C while having substantially lower computation times

Do you mean "Since D gives better ..."?

Since there is no clear winner that performs best in accuracy and runtime, it definitely makes sense to have this as an option. From the averages of the graphs above, I would also conclude that B (apriltag_homography, i.e. AprilTag's estimate_pose_for_tag_homography) and D (solve_pnp, i.e. OpenCV's solvePnP) would be good options, with D as the default. In fact, solvePnP is the same method used by the original ROS 1 version of apriltag_ros. However, solvePnP itself again has a lot of options (estimation method) and variants (such as cv::solvePnPRansac()) that will have an effect on the runtime and accuracy.

lennartalff commented 7 months ago

I updated my text introducing the histogram plots to avoid misunderstandings and also corrected the C/D typo.

For each orthogonal distance to the tags, N=1000 pose estimation have been evaluated. Since there are not 1000 tags visible in a single image, the simulated camera stream is evaluated until the total count of detections equals the desired number of pose estimations. Note that the relative pose between tags and camera stays the same over the series of evaluated camera images. The only difference between successive images is the gaussian noise applied to the simulated camera. Hence, the 'robustness' against noise will be reflected in the results.

The results are displayed as histograms. The respective x-axis displays the evaluated quantity (angle error, position error, computation time). Each y-axis represents how often (in absolute terms, with a sum of N=1000) the measured quantity occured in the results.

In https://github.com/christianrauch/apriltag_ros/issues/21#issuecomment-1817918398 you considered splitting up the detection and the pose estimation part in two separate components. I think that this is quite an attractive option. The apriltag node's main taks is to use the apriltag library, detect the corner points and publish a corresponding ROS message. The secondary task of estimating the pose is not as straight forward to implement, as there is a plethora of approaches/algorithms/tuning knobs to accomplish this task. Still I guess, that most people will consider pose estimation useful/required for their applications. Therefore, I would suggest to implement a reference node component providing Method B and Method D. If more control or other algorithms are required by users, they could replace the pose estimation component by their own implementation. This way "power users" are not limited by the methods implemented in this package. For all other users the combination of the apriltag detection component and the pose estimation component provides a "batteries included" setup to get started with a fully functional pose estimation pipeline right away.

Let me know what you think about this. I would be happy to provide a PR If you think that it makes sense to adopt some of my suggestions.

P.S.: @christianrauch What is the reasoning behind the rotation of the apriltag's coordinate system by 180° around the x-axis? To me it was quite counter intuative to not use the coordinate system as defined in the apriltag library

christianrauch commented 6 months ago

@lennartalff I reduced the options in the original branch and created a PR for it: https://github.com/christianrauch/apriltag_ros/pull/24. As discussed, there are now only two options for the parameter pose_estimation_method: pnp and homography. Thanks again for your evaluation.

@MihailV1989, @a-uhlig, @TomGrimwood, @KatoROSTeam, @lennartalff Can you test if the PR with the default pnp option improves the pose estimation accuracy for you and report back?

christianrauch commented 6 months ago

The apriltag node's main taks is to use the apriltag library, detect the corner points and publish a corresponding ROS message.

I would say the apriltag library's main function is to estimate the pose of AprilTags.

Therefore, I would suggest to implement a reference node component providing Method B and Method D. If more control or other algorithms are required by users, they could replace the pose estimation component by their own implementation.

You can already do that. The raw detections are published in any case and you can subscribe to them and the camera intrinsics to use an alternative method for pose estimation.

I think splitting the tag detection and pose estimation functionality into two separate nodes creates too much overhead. Both functionalities only take a couple of lines of code. If you look at the current implementation of the node, you will see that there is more code related to the framework (topics, callbacks, parameters, ...) than the actual tag detection and pose estimation functionality. The node is merely a wrapper around those libraries. Further separating the calls to library functions into nodes will further increase the overhead in maintaining the code. And even if the nodes are implemented as composable nodes, you will have runtime overhead and the overhead of managing two nodes instead of one in launch files etc. I guess that there are more people interested in extracting the pose of a tag than just the tag corners. At the moment you can access both from this single node without the hassle of managing two composable nodes. Did you envision that a dedicated "pose estimation from tag detection" node would be used independently of the tag detection? What would the interface look like? And why would this be easier (and more efficient) than just using OpenCV's cv::solvePnP, or your method of choice, directly via an API?

Overall I think that there are not many use cases for a dedicated "pose estimation from tag detection" node and that such a node would just create overhead to merit this split.

What is the reasoning behind the rotation of the apriltag's coordinate system by 180° around the x-axis? To me it was quite counter intuative to not use the coordinate system as defined in the apriltag library

If I am not mistaken, this is done in the pose estimation methods in the apriltag library function estimate_pose_for_tag_homography and in the ROS 1 node. In estimate_pose_for_tag_homography, a transformation fix is applied to the originally estimated pose M_H. The initial_pose coordinate frame's z-axis should point away from the front side of the tag. In the ROS 1 node with cv::solvePnP, the orientation of the coordinate frame is defined by the coordinates of the "object points" on the 3D unit tag frame and their correspondence to the 2D coordinates on the tag with the counterclockwise order.

IMHO, this is more intuitive as tags are mostly attached to the surface of an object, with the z-axis pointing away from this object, just like the world frame on a ground plane.