andrefdre / Dora_the_mug_finder_SAVI

Dora The Mug Finder: Detection and classification of items placed on top of a table using point cloud processing and neural networks.
GNU General Public License v3.0
5 stars 1 forks source link

Scenarios 5-8 and 13 object extraction (2D) does not work #43

Closed TatianaResend closed 1 year ago

TatianaResend commented 1 year ago

Scenes 5-8 and 13 don't have the z axis of the camera pointing to the table, so object extraction (2D) doesn't work, because the first scene image doesn't have any objects.

You need to implement a system that:

andrefdre commented 1 year ago

Hi, I was thinking about this and I thought one good first try is to pick up the scene poses for each image and use that as the rotation and translation matrix. Do you think it makes sense?

TatianaResend commented 1 year ago

Hi, I was thinking about this and I thought one good first try is to pick up the scene poses for each image and use that as the rotation and translation matrix. Do you think it makes sense?

I tried this suggestion, but the results were not positive. Either it happened to get caught in the middle of the images, or else it gave an error. Screenshot from 2023-01-19 16-26-32

[ERROR] [1674151865.892081]: bad callback: <bound method Image.callback of <__main__.Image object at 0x7f1c85eef9a0>>
Traceback (most recent call last):
  File "/opt/ros/noetic/lib/python3/dist-packages/rospy/topics.py", line 750, in _invoke_callback
    cb(msg)
  File "/home/tatiana/catkin_ws/src/Dora_the_mug_finder_SAVI/dora_the_mug_finder_bringup/scripts/image_extractor.py", line 93, in callback
    self.cropped_images.images.append(self.bridge.cv2_to_imgmsg(cropped_image, "passthrough"))
  File "/opt/ros/noetic/lib/python3/dist-packages/cv_bridge/core.py", line 271, in cv2_to_imgmsg
    img_msg.step = len(img_msg.data) // img_msg.height
ZeroDivisionError: integer division or modulo by zero

I'm not sure what each value in the .pose file represents, but I think it's quaternium (w,x,y,z) and translation (x,y,z). I'm just sure that the first value corresponds to w.

In researching the documentation, it is only reported that the .pose file is "3D scene reconstruction camera pose estimates"

TatianaResend commented 1 year ago

The transformation from the original frame to the object is known. The transformation from the original frame to the current frame is also known.

So instead of applying the camera transformation, I will try to apply the current frame transformation in relation to the object. WhatsApp Image 2023-01-20 at 16 46 39

TatianaResend commented 1 year ago

The previous approach is probably wrong. I think it is also necessary to take into account the transformation of the camera and not just those of the points. If I only have those of the points, what happens is that I represent the transformation of the points relative to the reference point of the current camera position, but the image is seen from the position of the first camera.

Result: The dots appear within the image, but appear to be inverted. I have already reviewed the operations of the matrices and apparently they are correct.

Screenshot from 2023-01-21 07-31-24

TatianaResend commented 1 year ago

The previous approach is probably wrong. I think it is also necessary to take into account the transformation of the camera and not just those of the points. If I only have those of the points, what happens is that I represent the transformation of the points relative to the reference point of the current camera position, but the image is seen from the position of the first camera.

Result: The dots appear within the image, but appear to be inverted. I have already reviewed the operations of the matrices and apparently they are correct.

Screenshot from 2023-01-21 07-31-24

This approach was not correct, since when taking into account the rotation of the camera and the rotation of the points, it overlapped, that is, it was as if it did nothing.

TatianaResend commented 1 year ago

The correct approach is shown in the figure.

Note: the order in the .pose file is QW,QX,QY,QZ,x,y,z

TatianaResend commented 1 year ago

It remains to run the images in a loop to see which is the first image that contains all the objects

TatianaResend commented 1 year ago

When we first got the min and max values to draw the bbox it was from the camera perspective. Now when searching other camera positions it can be the other way around, so the min and max can be swapped. To fix this we need to first check what is higher and then swap.

TatianaResend commented 1 year ago

Bbox points are not always in the same order [xmin ymax, xmax ymin], sometimes ymax and ymin are switched. A simple way, in the program, is to ignore these cases and try another image with another camera position.

In the particular case of scene 13, the 3rd image is not cropped well. But it is an isolated case, so it will only get better if easy to implement ideas come up.

andrefdre commented 1 year ago

Instead of using bbox to extract object images instead I do an intermidiate step to calculate width and height. So the center will always be in the middle of the extracted images.

width = bbox_2d[idx][1][0][0] - bbox_2d[idx][0][0][0]
height = bbox_2d[idx][0][0][1] - bbox_2d[idx][1][0][1]
cropped_image = image[round(point_2d[0][1]-height/2):round(point_2d[0][1]+height/2),round(point_2d[0][0]-width/2):round(point_2d[0][0]+width/2)]
TatianaResend commented 1 year ago

The problem is solved and even if I added the search for the best image of each object taking into account the overlapping of images (with an adaptation of the iou method).