Differences in the pose output between the ROS node and inference script

LukasBergs commented 8 months ago

Hello,

I am currently working on training a custom DOPE model on a battery pack object using a dataset generated with Nvidia Isaac Sim. The training process has been successful, and the inference results using the script (https://github.com/andrewyguo/dope_training/tree/master/inference) look promising.

However, when I use the trained model in the ROS2 node for real-time pose estimation, I encounter issues with the pose results. While the orientation appears correct, the position is significantly off, leading to inaccurate pose estimates.

Supplementary material:

Example training image:

000000

JSON file:

{"camera_data": {},
  "objects": [
    {
      "class": "battery_pack",
      "visibility": 0.9151999950408936,
      "location": [
        5.486481666564941,
        -18.360801696777344,
        114.90338134765625
      ],
      "quaternion_xyzw": [
        0.013374784961342812,
        0.26406049728393555,
        0.9631786346435547,
        -0.04878578335046768
      ],
      "projected_cuboid": [
        [
          -1.0563220977783203,
          133.28067016601562
        ],
        [
          589.8880615234375,
          -201.9678192138672
        ],
        [
          765.4427490234375,
          59.740596771240234
        ],
        [
          160.30252075195312,
          420.7066345214844
        ],
        [
          52.57433319091797,
          164.1396942138672
        ],
        [
          546.505859375,
          -118.38485717773438
        ],
        [
          691.6463012695312,
          102.5290756225586
        ],
        [
          187.83038330078125,
          403.00537109375
        ],
        [
          356.67864990234375,
          117.25300598144531
        ]
      ]
    },
}

Inference result using the python script:

JSON file:

{"camera_data": {},
  "objects": [
    {
      "class": "018_battery_pack",
      "location": [
        8.30719852171258,
        -3.7302530657860355,
        111.54764343059993
      ],
      "quaternion_xyzw": [
        0.9906192538625518,
        -0.06047349551092885,
        0.10690155339138838,
        0.05990415761864111
      ],
      "projected_cuboid": [
        [
          644.6524571927894,
          299.88523162941783
        ],
        [
          114.78411772282288,
          380.1847342389042
        ],
        [
          88.43919536214997,
          101.33316171037032
        ],
        [
          601.8356279350073,
          73.50575090038637
        ],
        [
          576.7778858351825,
          302.9538820509989
        ],
        [
          130.0462053521353,
          369.32050422005074
        ],
        [
          107.65703767437901,
          139.2413289399362
        ],
        [
          542.5726406234933,
          109.79893630537208
        ],
        [
          371.2373581103872,
          218.09222749128372
        ]
      ]
    }
  ]
}

config_pose.yaml:

input_is_rectified: True
downscale_height: 512
dimensions: {
    "003_cracker_box": [16.403600692749023,21.343700408935547,7.179999828338623],
    "009_gelatin_box": [8.918299674987793, 7.311500072479248, 2.9983000755310059],
    "010_potted_meat_can": [10.164673805236816,8.3542995452880859,5.7600898742675781],
    "006_mustard_bottle": [9.6024150848388672,19.130100250244141,5.824894905090332],
    "005_tomato_soup_can": [6.7659378051757813,10.185500144958496,6.771425724029541],
    "004_sugar_box": [9.267730712890625,17.625339508056641,4.5134143829345703],
    "021_bleach_cleanser": [10.267730712890625,26.625339508056641,7.5134143829345703],
    "071_nine_hole_peg_test": [12.6,3.9,12.6],
    '008_pudding_box':[49.47199821472168, 29.923000335693359, 83.498001098632812],
    "017_t_connector": [5.09, 5.306, 15.24],
    "018_battery_pack":[92,43,20.3],
}
class_ids: {
    "003_cracker_box": 1,
    "009_gelatin_box": 2,
    "010_potted_meat_can":    3,
    "006_mustard_bottle": 4,
    "005_tomato_soup_can":    5,
    "004_sugar_box":   6,
    "021_bleach_cleanser":  7,
    "071_nine_hole_peg_test": 8,
    '008_pudding_box': 10,
    '017_t_connector': 17,
    '018_battery_pack': 18,
}
draw_colors: {
    "003_cracker_box": [13, 255, 128],  # green
    "009_gelatin_box": [255, 255, 255],  # while
    "010_potted_meat_can": [0, 104, 255],  # blue
    "006_mustard_bottle": [217,12, 232],  # magenta
    "005_tomato_soup_can": [255, 101, 0],  # orange
    "004_sugar_box": [232, 222, 12],  # yellow
    "021_bleach_cleanser": [232, 222, 12],  # yellow
    "071_nine_hole_peg_test": [232, 222, 12],  # yellow
    "008_pudding_box": [255,0,0],
    "017_t_connector" : [0, 104, 255], # blue
    "018_battery_pack": [255, 101, 0],  # orange

}
thresh_angle: 0.5
thresh_map: 0.0001
sigma: 3
thresh_points: 0.1

camera_info.yaml:

image_width: 640
image_height: 480
camera_name: dope_webcam_0
distortion_model: plumb_bob
distortion_coefficients:
  rows: 1
  cols: 5
  data: [0.1623205691576004, -0.506430983543396, -0.002417039591819048, 5.408366632764228e-05, 0.4629184603691101]
rectification_matrix:
  rows: 3
  cols: 3
  data: [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
projection_matrix:
  rows: 3
  cols: 4
  data: [602.5180053710938, 0.0, 326.36651611328125, 0.0, 0.0, 602.6580200195312, 238.2456512451172, 0.0, 0.0, 0.0, 1.0, 0.0]

ROS2 node pose topic output

header:
  stamp:
    sec: 1703251435
    nanosec: 915388428
  frame_id: kairosAB_camera_color_optical_frame
poses:
  position:
    x: 0.004601923839252383
    y: -0.009230673242934324
    z: 0.07385986195806994
  orientation:
    x: 0.9969231703121176
    y: -0.06701344304282414
    z: -0.023362472894783784
    w: 0.03328041175561196
---

ROS2 DOPE config file:

dope:
  ros__parameters:
    # Cuboid dimension in cm x,y,z
    dimensions: {
      "cracker": [16.403600692749023,21.343700408935547,7.179999828338623],
      "gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059],
      "meat": [10.164673805236816,8.3542995452880859,5.7600898742675781],
      "mustard": [9.6024150848388672,19.130100250244141,5.824894905090332],
      "soup": [6.7659378051757813,10.185500144958496,6.771425724029541],
      "sugar": [9.267730712890625,17.625339508056641,4.5134143829345703],
      "bleach": [10.267730712890625,26.625339508056641,7.5134143829345703],

      # HOPE objects
      "AlphabetSoup" : [ 8.3555002212524414, 7.1121001243591309, 6.6055998802185059 ],
      "Butter" : [ 5.282599925994873, 2.3935999870300293, 10.330100059509277 ],
      "Ketchup" : [ 14.860799789428711, 4.3368000984191895, 6.4513998031616211 ],
      "Pineapple" : [ 5.7623000144958496, 6.95989990234375, 6.567500114440918 ],
      "BBQSauce" : [ 14.832900047302246, 4.3478999137878418, 6.4632000923156738 ],
      "MacaroniAndCheese" : [ 16.625600814819336, 4.0180997848510742, 12.350899696350098 ],
      "Popcorn" : [ 8.4976997375488281, 3.825200080871582, 12.649200439453125 ],
      "Mayo" : [ 14.790200233459473, 4.1030998229980469, 6.4541001319885254 ],
      "Raisins" : [ 12.317500114440918, 3.9751999378204346, 8.5874996185302734 ],
      "Cherries" : [ 5.8038997650146484, 7.0907998085021973, 6.6101999282836914 ],
      "Milk" : [ 19.035800933837891, 7.326200008392334, 7.2154998779296875 ],
      "SaladDressing" : [ 14.744099617004395, 4.3695998191833496, 6.403900146484375 ],
      "ChocolatePudding" : [ 4.947199821472168, 2.9923000335693359, 8.3498001098632812 ],
      "Mushrooms" : [ 3.3322000503540039, 7.079899787902832, 6.5869998931884766 ],
      "Spaghetti" : [ 4.9836997985839844, 2.8492999076843262, 24.988100051879883 ],
      "Cookies" : [ 16.724300384521484, 4.015200138092041, 12.274600028991699 ],
      "Mustard" : [ 16.004999160766602, 4.8573999404907227, 6.5132999420166016 ],
      "TomatoSauce" : [ 8.2847003936767578, 7.0198001861572266, 6.6469998359680176 ],
      "Corn" : [ 5.8038997650146484, 7.0907998085021973, 6.6101999282836914 ],
      "OrangeJuice" : [ 19.248300552368164, 7.2781000137329102, 7.1582999229431152 ],
      "Tuna" : [ 3.2571001052856445, 7.0805997848510742, 6.5837001800537109 ],
      "CreamCheese" : [ 5.3206000328063965, 2.4230999946594238, 10.359000205993652 ],
      "Parmesan" : [ 10.286199569702148, 6.6093001365661621, 7.1117000579833984 ],
      "Yogurt" : [ 5.3677000999450684, 6.7961997985839844, 6.7915000915527344 ],
      "GranolaBars" : [ 12.400600433349609, 3.8738000392913818, 16.53380012512207 ],
      "Peaches" : [ 5.7781000137329102, 7.0961999893188477, 6.5925998687744141 ],
      "GreenBeans" : [ 5.758699893951416, 7.0608000755310059, 6.5732002258300781 ],
      "PeasAndCarrots" : [ 5.8512001037597656, 7.0636000633239746, 6.5918002128601074 ],

      # Custom objects
      "t_connector" : [ 5.09, 5.306, 15.24 ],
      "battery_pack" : [ 92.0, 43.0, 20.3 ],

    }

    # 9 element camera matrix (assuming 640x480 image)
    # camera_matrix: [463.51, 0.0, 321.652, 0.0, 616.44, 232.260, 0.0, 0.0, 1.0]
    camera_matrix: [602.5180053710938, 0.0, 326.36651611328125, 0.0, 602.6580200195312, 238.2456512451172, 0.0, 0.0, 1.0]

In the training as well as in the inference result by the python script the object is in z direction about 100cm away from the camera. This distance seems to be okay. When running the ROS2 node the pose topic states that the object would be 0.07m away from the camera. That is definitely way closer than it should be. What could cause this issue?

I would greatly appreciate any guidance or assistance in debugging and resolving this issue. Thank you in advance for your time and support. If additional information is needed for further analysis, please let me know.

Best, Lukas

LukasBergs commented 8 months ago

Update:

I initially used the dimensions of the butter object, leading to the object appearing much smaller than expected. This issue was resolved by correcting the object_name parameter. Despite the correction, the results still differ from those obtained using the inference script.

The disparity between the two sets of results is demonstrated in the following videos:

Using inference script:

https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_pose_estimation/assets/56199842/452cf7b7-34af-4741-88ac-b38555f799f1

Result: The inference results for most of the image look good.

Using ROS2 node on synthetic data:

https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_pose_estimation/assets/56199842/769cc501-ebea-4229-89bb-c2b55147343e

Result: I cannot guarantee that the 3D bounding box in the image is correct. Nevertheless, 3D pose shows strange rotations.

Using ROS2 node on real data:

https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_pose_estimation/assets/56199842/78f2cb70-7f31-4318-a9ce-6f6b1818551d

Result: Distance to the camera in z direction looks fine, but rotation is off.

Any ideas what could go wrong?

Best, Lukas

jaiveersinghNV commented 7 months ago

Hi @LukasBergs ,

I've attached a document that outlines the process we have used to validate DOPE inference quality using the toy Ketchup model. Could you please walk through these steps and verify if you can correctly match inference script and ROS 2 node results for the toy model?

dope-verification.md

Once we're sure that you're able to produce correct results with the NVIDIA-provided model, we can isolate where the error might be with your custom model.

LukasBergs commented 7 months ago

Hi @jaiveersinghNV ,

Thank you for guiding me with the provided markdown file!

Upon re-evaluating the specified values for image and network dimensions, I've identified an issue with the launch parameter "output_height" in the launch file

It seems that both parameters, output_width and output_height, are assigned the value of network_image_width. I assume this is a mistake. Could you please verify?

Having updated the parameter to network_image_height, I have observed that my results now align more closely with the inference python script. For the time being, I am closing the issue.

Thank you for your help!

NVIDIA-ISAAC-ROS / isaac_ros_pose_estimation

Differences in the pose output between the ROS node and inference script #32