Closed MihailV1989 closed 6 months ago
The API of the apriltag library changed and we incremented the version for that. That version has not been released to the official ROS repos yet. You will have to build it from source. The node requires this new version. If you just remove the version check, then you will get compilation errors.
I haven't implemented another pose estimation method as I didn't have the need for this. If you found a better solution, you can send a PR with some comparison results (speed, accuracy) and I will review it.
Thanks for the fast reply! In fact I'm not getting any compilation errors after I removed the version check. Only the wrong parameter type errors on launch. With the fixes I mentioned above the node is working correctly, so I'll then just wait for the official ROS repos release.
As for the pose estimation method, I didn't found a new solution, I just implemented the the official method that is described in the AprilTag Wiki: https://github.com/AprilRobotics/apriltag/wiki/AprilTag-User-Guide#pose-estimation
The accuracy gain of the pose estimation is quite obvious and the exact method is documented in a paper. This is from apriltag_pose.h that is used for the pose estimation:
Estimate pose of the tag. This returns one or two possible poses for the
tag, along with the object-space error of each.
This uses the homography method described in [1] for the initial estimate.
Then Orthogonal Iteration [2] is used to refine this estimate. Then [3] is
used to find a potential second local minima and Orthogonal Iteration is
used to refine this second estimate.
[1]: E. Olson, “Apriltag: A robust and flexible visual fiducial system,” in
2011 IEEE International Conference on Robotics and Automation,
May 2011, pp. 3400–3407.
[2]: Lu, G. D. Hager and E. Mjolsness, "Fast and globally convergent pose
estimation from video images," in IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 22, no. 6, pp. 610-622, June 2000.
doi: 10.1109/34.862199
[3]: Schweighofer and A. Pinz, "Robust Pose Estimation from a Planar Target,"
in IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 28, no. 12, pp. 2024-2030, Dec. 2006. doi: 10.1109/TPAMI.2006.252
@outparam err1, pose1, err2, pose2
Then in estimate_tag_pose() in apriltag_pose.c the pose with the smaller error is returned and this is the function that I've used. It would be also useful to publish the final object-space error, but for this a new msg will be needed.
With the fixes I mentioned above the node is working correctly, so I'll then just wait for the official ROS repos release.
Alternatively, you can just check out the library in your workspace and compile it together with the node.
As for the pose estimation method, I didn't found a new solution, I just implemented the the official method that is described in the AprilTag Wiki:
I just meant that if you find that this works better and is more accurate, you can send a PR with that proposed implementation in the node. I can review it, but it would be good to have some comparison in this PR that shows that the alternative implementation works better in speed and/or accuracy.
Will do. In the mean time I think I found an issue with the pose estimation. The node is using the camera intrinsic parameters, specifically the matrix P, without adjusting them to the actual image resolution. The CameraInfo message provides the original calibration resolution and corresponding intrinsic parameters: http://docs.ros.org/en/melodic/api/sensor_msgs/html/msg/CameraInfo.html
They have to be then scaled proportionally to the scale of the actual image used for pose estimation relative to calibration resolution: https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html#MathJax-Element-78-Frame
I've implemented this as well in my fork: https://github.com/MihailV1989/apriltag_ros/commit/a7c4d8b77c0e7152346be93b01b9a39fabee5a88
So far it's working correctly and now I've tested it on a Raspberry Pi 4 with 1GB RAM and headless Ubuntu. I'll provide performance test results in the next days.
They have to be then scaled proportionally to the scale of the actual image used for pose estimation relative to calibration resolution:
Is this related to quad_decimate
in the library and detector.decimate
in the node? The image dimensions in the image you get from the camera should match the values in the CameraInfo message. Is there a case where the processed image dimension differs from the received image?
I would say that it would be inconvenient if you are stuck with the original calibration resolution. If you're using a third party camera image publisher and you calibrate the camera at say full resolution, then you have to manually scale the intrinsic parameters every time you change the resolution. In my case I'm starting a third party publisher node and I have to give the path to a .yaml file where the intrinsic camera parameters are saved and I could edit the file according to the set resolution. But this is only possible at start and with time a calculation error will accumulate. So I don't find it a good solution as you won't be able to experiment easily with different resolutions.
Then of course is the question why not to use always full resolution. Normally you would like to find the optimal resolution for your application so that you don't overload your system. In my case the main limiting factor is the 1GB RAM and the camera bandwidth. But probably there are other use cases where you don't want to use up your resources for publishing big images at high FPS and later decimate the images.
Then about the decimate function. In the case that you're interested in the pose estimation capabilities, it does not make sense to reduce the resolution only for the quad detection that is then used for the pose estimation and still use the full resolution for decoding the binary payload. I don't know how much computational intensive the decoding is, but when I run it on a hardware with limited resources I don't want it to perform better than needed. When a tag is moving away from the camera, the pose estimation becomes unusable much sooner than the decoding.
Maybe all this is not a big deal, but scaling the intrinsic camera parameters when the actual image resolution differs from the calibration resolution is even less problematic and makes the node more foolproof, isn't?
I would say that it would be inconvenient if you are stuck with the original calibration resolution. If you're using a third party camera image publisher and you calibrate the camera at say full resolution, then you have to manually scale the intrinsic parameters every time you change the resolution. In my case I'm starting a third party publisher node and I have to give the path to a .yaml file where the intrinsic camera parameters are saved and I could edit the file according to the set resolution. But this is only possible at start and with time a calculation error will accumulate. So I don't find it a good solution as you won't be able to experiment easily with different resolutions.
I am still not sure what the exact problem is. What do you mean with being "stuck with the original calibration resolution"? Every camera setting, such as image dimension, zoom, lense, etc. will have a dedicated set of intrinsic parameters.
When a ROS node publishes images, it also has to publish the corresponding intrinsics. When this node publishes a scaled version of the image, it also has to scale the intrinsics accordingly. You will see this with cameras supporting different image resolutions. Most of them will have dedicated intrinsics for every setting. Otherwise, you will have to calibrate them manually for every setting you use. It's the responsibility of the "sender" (e.g. a camera or generic image publisher) to make sure that image data and intrinsics are correct and match. The "receiver" (e.g. this apriltag node) cannot know how the images are scaled. It just relies on the image data and intrinsics.
Then about the decimate function. In the case that you're interested in the pose estimation capabilities, it does not make sense to reduce the resolution only for the quad detection that is then used for the pose estimation and still use the full resolution for decoding the binary payload.
It depends on your application of course. If there is only a single image consumer/subscriber then publishing at a lower resolution makes sense. But if there are multiple consumers, you force them all to use the lower resolution, even though a higher resolution would be possible. So this is entirely up to your setting and you cannot easily generalise this. Some people will only want to reduce the resolution for the AprilTag detection, some will want to have a lower resolution for their entire setup. In any case, you have to provide the corresponding intrinsics.
Maybe all this is not a big deal, but scaling the intrinsic camera parameters when the actual image resolution differs from the calibration resolution is even less problematic and makes the node more foolproof, isn't?
If the image source is publishing scaled images it also has to scale the intrinsics accordingly. It's not the responsibility of the receiver to figure out scaling and adapt the intrinsics. It is much more foolproof when the image data and correct intrinsics are published by the sender.
So, I reverted back the last changes with the automatic adjustment of the camera intrinsic parameters.
And also manged to do few simple tests. I ran a preview on the detected tags on live camera image.
As mentioned in the beginning, I'm using ROS2 Galactic on Ubuntu 20.04.5 LTS on virtual machine and for video capture I'm using a OAK-D camera from luxonis and the performance is rather bad. I'm using a HP ZBook 15 G2 with Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz with 4 of 8 threads and 6GB RAM assigned to the virtual machine.
Tags have 20 mm width, no decimation and no blur set for the AprilTag detector. I've tried with and without the more precise pose estimation, with 1 or 4 threads and at both 1280x720 and 960x540 image resolution. Results:
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
| refine-pose | detector.threads | width | height | detections topic avg. hz -- | -- | -- | -- | -- | -- 1 | TRUE | 1 | 1280 | 720 | 0,9 2 | FALSE | 1 | 1280 | 720 | 0,9 3 | TRUE | 4 | 1280 | 720 | 1,0 4 | FALSE | 4 | 1280 | 720 | 0,9 5 | TRUE | 1 | 960 | 540 | 1,6 6 | FALSE | 1 | 960 | 540 | 1,7 7 | TRUE | 4 | 960 | 540 | 1,5 8 | FALSE | 4 | 960 | 540 | 1,5
First of all, thanks for the ROS2 port!
I've pulled today the current version and noticed, that apriltag_ros cannot be build any more. I'm using ROS2 Galactic on Ubuntu 20.04.5 LTS on virtual machine and I got the following error:
In the Galactic central index there isn't still apriltag 3.3: https://github.com/ros/rosdistro/blob/master/galactic/distribution.yaml so I wonder where can I get it?
Until then, I found out, that I can remove the version from CMakeLists.txt line 25 like that:
find_package(apriltag REQUIRED)
Only then I got other errors on launch. The initialization of the ROS2 parameters "detector.refine" and "detector.debug" fails even when I do not set the parameters and wrong parameter type error appears. Here the one for "detector.refine", they are both identical:
I managed to fix the error by reverting the data type back to int, as it was before commits on Aug 28, 2022. Does this has something to do with the fact, that I'm not using the newest version 3.3? The parameters are being initialized from the "apriltag_detector_t* const td".
Then as a side question, I was wondering why the more precise pose estimation from the apriltag library is not implemented. The pose estimation from the homography has very low precision that is almost unusable, isn't? I've took the time to implement it as an optional setting that is turned off by default and I think it could be useful to others as well. I can gladly contribute it if you want: https://github.com/MihailV1989/apriltag_ros
Only I cannot guarantee there aren't any hidden bugs and that the code is optimized well as I have no experience with C++. The more precise pose estimation can be turned on by the "refine-pose" parameter that I saw in a very old launch.py file few months ago.