Issues with 3D Bounding Box Detection Accuracy in Custom Dope Model

Hello,

I am currently trying to develop a custom Dope model to detect a custom .obj model. I am struggling to get good results on the inference side. I have generated a 75k dataset, ant trained with train.py from train2 folder on the current repository for 100 epochs. Although the loss as decreased the inference seems to be struggling to accurately detect the dimension of the 3D bounding box. I am not sure what could be improved to attain better results, perhaps increase the scale of the object in the generated dataset.

In the inference side I have added the files in the common folder to the train2 due to not recognizing the cuboid module, and also commented make_belief_debug_img=True in line 264 in inference.py. The center of the object is detected relatively well but the remaining bounding box does not perform well.

.obj model: m6.zip

Ground truth: 000018 000028 000034 000035 000038

Training run: Screenshot from 2024-07-26 14-54-15

Config file:

topic_camera: "/dope/webcam/image_raw"
topic_camera_info: "/dope/webcam/camera_info"
topic_publishing: "dope"
input_is_rectified: True   # Whether the input image is rectified (strongly suggested!)
downscale_height: 400      # if the input image is larger than this, scale it down to this pixel height

# Comment any of these lines to prevent detection / pose estimation of that object
weights: {
    'drill_6':"/home/rics/catkin_ws/src/Deep_Object_Pose/train2/tmp/net_epoch_100.pth"
}

# Type of neural network architecture
architectures: {
    'drill_6':"dope",
}

# Cuboid dimension in cm x,y,z
dimensions: {
    "drill_6": [1.14,1.14,11.3],
}

class_ids: {
    "drill_6": 1,
}

draw_colors: {
    "drill_6": [13, 255, 128],  # green
}

# optional: provide a transform that is applied to the pose returned by DOPE
model_transforms: {
}

# optional: if you provide a mesh of the object here, a mesh marker will be
# published for visualization in RViz
# You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb
meshes: {
}

# optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.
mesh_scales: {
}

# Config params for DOPE
thresh_angle: 0.5
thresh_map: 0.0001
sigma: 3
thresh_points: 0.1

Inference: 000017 000024 000034 000035 000062

I also attempted to adapt the parameters in camera_info.yaml but the results did not change. Any help would be great. Thanks.

NVlabs / Deep_Object_Pose

Issues with 3D Bounding Box Detection Accuracy in Custom Dope Model #376