NVlabs / FoundationPose

[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
https://nvlabs.github.io/FoundationPose/
Other
1.5k stars 203 forks source link

Real-time processing for driller pose estimation #25

Closed JRvilanova closed 7 months ago

JRvilanova commented 7 months ago

Thanks for such an amazing job!

I am currently trying to use a RealSense D455 camera on a real-time application; however, i get no pose estimation with my own kinect driller. For this purpose I am using the mask and the obj from the kinect_driller_seq that you suggested in the repo. I also updated the K matrix that corresponds to my camera. I have seen the depth values that are provided in the dataset and is completely different from what i get from the RealSense (the ones from the dataset are completely black). Could this issue have any impact?

What could go wrong or how can i change the way i am approaching this application?

Thanks a lot in advance!

unnamed333user commented 7 months ago

hi @JRvilanova for running with your custom images (from realsense camera). you should draw the mask for the first image from the color_files (list of all images in your folder). you can see the color_files in YcbineoatReader class. The range of depth image in driller demo (captured from kinect) are from 0 to 7. But from realsense, the range is much different. i'm not sure whether it leads to bad performance when using realsense camera or not. Iam waiting for @wenbowen123 's response.

wenbowen123 commented 7 months ago

The depth image is saved in PNG as uint16 in milimeter scale. To verify, you can increase the arg debug to 3, then it will save some PLY files in the debug_folder. You can see if those point cloud make sense.

unnamed333user commented 7 months ago

@wenbowen123 the depth image from realsense camera is uint16 and in milimeter scale already. I also checked some output PLY files in the debug_folder. everything is normal. I can send you my custom data for testing if needed.

JRvilanova commented 7 months ago

I tried what you suggested (including adding the mask manually) and the results are not good in both the .ply file but neither in the pose estimation. I attach photos of the image with the pose (which is the one with a little point) and the mask. I am currently using the.obj from the dataset. Could this be an issue?

1 2 1 4

wenbowen123 commented 7 months ago

@unnamed333user @JRvilanova if you upload the debug folder, I can take a look. For @JRvilanova if the PLY does not look right, it's likely there is still something wrong with depth image.

abhishekmonogram commented 7 months ago

@JRvilanova The usual issue I have observed is that the scale of the CAD object is larger by a factor of 1000. As @wenbowen123 suggested its easier to understand if you load the .ply from the debug folder alongside your cad object. If the scales don't match up. Please make sure the scale is the same and give it a try. That should resolve it.

unnamed333user commented 7 months ago

@wenbowen123 @abhishekmonogram for me everything is normal. Here is my debug folder: https://drive.google.com/file/d/1OaOUCv60U3pN9BebmC6vhJxxS6okWxtW/view?usp=sharing

wenbowen123 commented 7 months ago

@unnamed333user I dont have permission to access the gdrive file.

unnamed333user commented 7 months ago

@unnamed333user I dont have permission to access the gdrive file.

let's try that link again

wenbowen123 commented 7 months ago

@unnamed333user I'm in contact with Mona on email (this seems to be the same data), please coordinate with her.

unnamed333user commented 7 months ago

is your issue solved ? @JRvilanova

JRvilanova commented 7 months ago

No, I am still getting the same error, haven't solve it. I see there is a huge difference btw the depth.png I get with the demo data and the one i get with my own data. Here you have a link to my debug folder: https://drive.google.com/drive/folders/1313FyxSBlh0fu4PPcB68oow0In_uWiIr?usp=sharing

Moreover, this is the code I am using to convert the RealSense depth data (when saved to .png has 3 channels).

depth = reader.get_depth(i) depth = cv2.normalize(depth, None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U) depth = cv2.cvtColor(depth, cv2.COLOR_BGR2GRAY)

Another source of problem could be the K matrix. Do I have to get the one from the depth or color camera?

unnamed333user commented 7 months ago

hi @JRvilanova let try my code to get the rgb-d images from realsense. you dont have to convert anything here because the depth image from realsense camera is uint16 and in milimiter already (the same fromat with depth image from kinect).

import cv2 import numpy as np import pyrealsense2 as rs import os

pipeline = rs.pipeline() config = rs.config() config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30) config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

pipeline.start(config)

i = 0 try: while True:

Wait for a coherent pair of frames: depth and color

    frames = pipeline.wait_for_frames()
    depth_frame = frames.get_depth_frame()
    color_frame = frames.get_color_frame()

    # Convert depth frame to a numpy array
    depth_image = np.asanyarray(depth_frame.get_data())        
    cv2.imwrite(os.path.join(depth_dir, '{}.png'.format(i)), depth_image)        

    # Convert color frame to a numpy array
    color_image = np.asanyarray(color_frame.get_data())
    cv2.imwrite(os.path.join(rgb_dir, '{}.png'.format(i)), color_image)

    # Display the color and depth images (optional)
    cv2.imshow('Color Image', color_image)
    cv2.imshow('Depth Image', depth_image)

    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

    i += 1

finally: pipeline.stop() cv2.destroyAllWindows()

my intrinsic setting is: 6.065820007324218750e+02 0.000000000000000000e+00 3.202149847676955687e+02 0.000000000000000000e+00 6.061186828613281250e+02 2.443486680871046701e+02 0.000000000000000000e+00 0.000000000000000000e+00 1.000000000000000000e+00 In my test case, it can generate pose for the first frame and tracking quite good for few frames after that. but when i rotated the object a bit, the model cant track the object anymore @wenbowen123 i think you'd better check my debug folder in the link i sent above.

JRvilanova commented 7 months ago

I tried with your code and I have a better results but struggling the same way as you. The starting poses are way better than the ones 10 seconds later. I upoaded to the shared folder the new debug files i get.

Thanks a lot @unnamed333user for the code. I was using it in order to trying real time; however, when i try to do the pose estimation with these images, i am asked by the program to use 'uint8' as 'uint16' is not supported. Do you have any hint we could that be happening?.

trungpham2606 commented 7 months ago

hi @JRvilanova i didnt meet the issue you mentioned :O Glad that your result is better than before, but I think we are still missing something here, because we are testing the object was in the training set then there's no way the result is that bad.

wenbowen123 commented 7 months ago

@JRvilanova you did not use the correct driller model. You need to get a 3D model for your test driller (e.g. with BundleSDF). Or you can try model-free setup

wenbowen123 commented 7 months ago

@trungpham2606 none of the testing objects are from the training set. All the failures so far are because of something wrongly setup

wenbowen123 commented 7 months ago

@unnamed333user are you a colleague of Mona? As I mentioned, she emailed me the same data.

wenbowen123 commented 7 months ago

In vis_score.png, the first column is the rendered view of your model. The second column is the test image. If the two objects are different, chances are that the wrong model is used.

unnamed333user commented 7 months ago

@unnamed333user are you a colleague of Mona? As I mentioned, she emailed me the same data.

no. In addition, I'm testing with potted meat can while Mona's testing the driller ? Iam using the model in the YCB webpage. The initial frame, the pose is correct, but after that, it cant track :)

JRvilanova commented 7 months ago

Adding a .obj file of the object allowed me to detect that object correctly under very specific conditions. Then, I was able to solve what I asked in this issue, I will close it and open a new one to solve the poor performance of the object after the object leaves the scene or after fast movements.

Thanks so much for your help guys!

eunseon02 commented 7 months ago

Hi @JRvilanova, please tell me how you generate the mask.

abhishekmonogram commented 7 months ago

Hi @JRvilanova, please tell me how you generate the mask.

There are multiple ways to go about it. It could be using SAM or any other segmentation network

wenbowen123 commented 7 months ago

@unnamed333user are you a colleague of Mona? As I mentioned, she emailed me the same data.

no. In addition, I'm testing with potted meat can while Mona's testing the driller ? Iam using the model in the YCB webpage. The initial frame, the pose is correct, but after that, it cant track :)

Your link does not work.

AdventurerDXC commented 6 months ago

@wenbowen123 I think I've met the same problem. I'm sure the camera K matrix correct, CAD model in mm units, depth imgs correspondent to rgb. But pose estimations are false. I also noticed my depth values not align with the ones you've given. Here's my depth and debug zip files: https://drive.google.com/file/d/1fczfo5E23rul2de9i7MJDjATR9L-RRuc/view?usp=drive_link https://drive.google.com/file/d/1LE28BpwBx4eOH8oypGaXvkjeLcqniTzp/view?usp=sharing

wenbowen123 commented 6 months ago

@AdventurerDXC your depth image is in the wrong format. You can viz its generated point cloud scene_raw.ply in the debug folder.