christianrauch / apriltag_ros

ROS2 node for AprilTag detection
MIT License
128 stars 85 forks source link

Multiple cameras #32

Closed goatfanboi23 closed 2 months ago

goatfanboi23 commented 2 months ago

I am have trouble using the apriltag ros node when using more than one camera. The problem stems from having too many parents in the transformation tree for one child. Right now, we have a transformation from world->tag and a transformation that your node provides from 'camera->tag1'.

These to transformation have to be inverted to retain tree structure, so camera->tag becomes tag->camera

From this data, we create a tree that looks like this:

world -> tag -> camera.

From this, we can then get the transformation between the world and the camera. Then, we offset that pose by the distance from the camera to the center of the robot (the camera does not move). Ideally, this would be done with base_link. However, at the time, we only had one camera, and this solution was simpler.

This is already a weird tf tree, but I worked for our original design.

However, if we want to have two cameras and we invert the transformation, base_link would have two parents. Moreover, if we did not invert the transformation, the tag would have three parents. Either way, we end up breaking the tree structure.

Screenshot_20240425_191128_GitHub.jpg

Thank you for your help.

christianrauch commented 2 months ago

I don't properly understand your problem or why this is caused by the estimated tag poses.

If you have two cameras in your kinematic tree (both cameras have a known pose with respect to some other frame), and both cameras see the same tag and have a dedicated apriltag node each that publishes the estimated tag pose via the same child frame id, then you will not have a tree structure anymore but a graph that with a split and merged path since both cameras are connected to the same parent frame and produce the same child frame. You have to choose different child frame names, e.g. tag_left and tag_right, in order to obtain a proper TF tree.

goatfanboi23 commented 2 months ago

Thank your for your help and clarification. I understand your child frame Id solution for the tag transformations. However, I am wondering what the best way to then lookup the transformation between the tag and the world (the tag is fixed) is. We need this transformation so we can then takw the transformation between our camera and our tag and calculate our base frames position in the world. The solution that comes to mind us to just store the positions in a dictionary (we are sadly using python) however if there is a ros based solution that would be preferred. I am also wondering if it is possible to run the detections on the same Apriltag node or if we have to use two separate nodes. Thank you taking time out of your day to help me with my questions, it is deeply appreciated.

christianrauch commented 2 months ago

However, I am wondering what the best way to then lookup the transformation between the tag and the world (the tag is fixed) is. We need this transformation so we can then takw the transformation between our camera and our tag and calculate our base frames position in the world.

If you have the following transformations given:

  1. world to base
  2. base to camera
  3. camera to tag (via the detector)

Then you can simply get the transformation from world to tag at any point in time and invert it to get the transformation from tag to world.

Or is one of the transformations missing and you actually want to calibrate it? In this case, you probably want to do an "Eye-in-Hand" calibration, e.g. via the MoveIt calibration tool: https://moveit.picknik.ai/humble/doc/examples/hand_eye_calibration/hand_eye_calibration_tutorial.html.

goatfanboi23 commented 2 months ago

The world-hand-eye calbration seems like it could be applicable after doing some preliminary reading. Here are some details about our specific situation.

We have a robot (base_link) whose position is unknown relative to the world

We have a camera on said robot that is in a fixed position, meaning that we know the transformation between the base_link and camera

In the world, there are apriltags in a fixed position in the world, meaning we have a transformation between world and tag

From this information, we want to calculate the robot's position in the world (transformation between world and robot)

The robot is moving around, and our goal is to figure out its position in the world

I think my confusion stems from how the tf tree should be set up. If you could clarify, that would be helpful, and if you could point me in the direction of some reading, it would be appreciated (the ros wiki left me asking more questions than awnsers).

christianrauch commented 2 months ago

If your chain is:

base --> camera --> tag <-- world

then you only need to invert your "world --> tag" transformation to get "tag --> world" and you get the chain:

base --> camera --> tag --> world

and this gives you the pose of the "base" frame in the "world" frame.

But I still don't understand what this has to do with multiple cameras and the apriltag node.

goatfanboi23 commented 2 months ago

If your chain is:

base --> camera --> tag <-- world

then you only need to invert your "world --> tag" transformation to get "tag --> world" and you get the chain:

base --> camera --> tag --> world

and this gives you the pose of the "base" frame in the "world" frame.

But I still don't understand what this has to do with multiple cameras and the apriltag node.

The multiple cameras I believe this is what it would look like. (Two tags highlighted in yellow are the same. The only difference is id) Screenshot_20240501_160147_Samsung Notes.jpg

My questions are as follows.

  1. Can this be run on a single apriltag detection node.
  2. Is there a simpler way to organize the tree that does not have "duplicate" (for lack of another word) frames.
christianrauch commented 2 months ago
  • Can this be run on a single apriltag detection node.

No, you cannot use one apriltag node with multiple cameras. You have to create one apriltag node per camera.

  • Is there a simpler way to organize the tree that does not have "duplicate" (for lack of another word) frames.

If you do this as shown, you will create parallel branches for your kinematic chain. But if you merge the "base" link again, then you create a loop in your tree.

I still don't know why you want to use multiple cameras for this. If you just want to know where your base link is, you can use one camera with the chain above. If you want multiple estimates from multiple cameras you have to use different frame names for the same tag and combine the two estimates poses (from your two cameras) in some way, e.g. naively average your position etc.