inference results are not good for cracker

monajalal commented 1 year ago

I trained DOPE on the entire FAT dataset for the cracker object and took this photo with Intel RealSense camera and the results are not very good and no object is detected besides some heatmaps that are somewhat detected. Any pointers what can be changed in inference code?

(dope) mona@ard-gpu-01:~/research/dope/scripts$ python train2/inference.py  --data /home/mona/research/dope/scripts/train2/CRAKER_input/
current working dir:  /home/mona/research/dope/scripts
output is located in out_experiment
videopath:  /home/mona/research/dope/scripts/train2/CRAKER_input/
image_files:  ['/home/mona/research/dope/scripts/train2/CRAKER_input/new_rgb_image-0003.png']
j is /home/mona/research/dope/scripts/train2/CRAKER_input/new_rgb_image-0003.png and video_path is: 
{'cracker': '/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth'}
*****
model is:  cracker
config['architectures'] :  {'cracker': 'dope'}
Loading DOPE model '/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth'...
/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
  warnings.warn(msg)
    Model loaded in 2.09885835647583 seconds.
loaded
Ctrl-C to stop
opt data:  /home/mona/research/dope/scripts/train2/CRAKER_input/
i_image is:  0
len(imgs):  1
frame new_rgb_image-0003.png
frame type is <class 'numpy.ndarray'> and frame length 480
vertex2 size[0]:  9
vertex2 size:  torch.Size([9, 50, 66])
peaks len:  4
peaks len:  9
peaks len:  9
peaks len:  8
peaks len:  4
peaks len:  7
peaks len:  7
peaks len:  6
peaks len:  3
len(all_peaks[-1]):  3
nb_object:  0
all_peaks[-1][nb_object][2]:  0.060278796
nb_object:  1
all_peaks[-1][nb_object][2]:  0.020281952
nb_object:  2
all_peaks[-1][nb_object][2]:  0.008030053
objects are:  []
results are:  []
beliefs are:  <PIL.Image.Image image mode=RGB size=1639x1240 at 0x7F8E5669ED90>
dict_out is:  {'camera_data': {'intrinsics': {'cx': 616.056, 'cy': 616.279, 'fx': 321.534, 'fy': 235.879}, 'width': 640, 'height': 480}, 'objects': []}

new_rgb_image-0003 new_rgb_image-0003_belief

Any help is really appreciated.

TontonTremblay commented 1 year ago

This looks promising :P have you tried having the object a little further out from the camera? Can the threshold in the config for something smaller?

mintar commented 1 year ago

I'd recommend first repositioning the camera (or object) so that you look at it from an angle, not frontally. This might help getting more unique corner detections. For example, here are some good detections on my own data:

1. perfect

dope_cracker_2023-06-27_10-55-36_perfect

2. good

dope_cracker_2023-06-27_10-55-23_good

You can see that in both cases there is only one clear maximum in most belief maps.

3. bad

On the other hand, here's an example that looks more like your picture. There are multiple maxima, and DOPE fails to recognize the object:

dope_cracker_2023-06-27_10-55-01_bad

So to debug this, I'd recommend to first find a position of the camera / object so that the belief maps look like image 1 or 2. If you then still don't get any object detection results, it's usually either the dimensions or (more rarely) the camera intrinsics are wrong.

Once you've gotten your first detection results, and they look halfway correct in 3D, you know that your dimensions parameter and camera intrinsics are correct, and then you can start evaluating the quality of the detections. On my own data, the quality is not stellar for the cracker object, but good enough for me.

monajalal commented 1 year ago

Thank you @TontonTremblay and @mintar for your responses. Sorry for late reply. I didn't have access to camera.

So, I tried with this view and the results are even worse.

My new questions are: 1) shouldn't DOPE detect the object from all views? 2) what is the range of distance for the object to be detected? rgb_image-0308_belief

config_pose.yaml:

topic_camera: "/dope/webcam/image_raw"
topic_camera_info: "/dope/webcam/camera_info"
topic_publishing: "dope"
input_is_rectified: True   # Whether the input image is rectified (strongly suggested!)
downscale_height: 400      # if the input image is larger than this, scale it down to this pixel height

# Comment any of these lines to prevent detection / pose estimation of that object
weights: {
    "cracker": "/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth",
}

# Type of neural network architecture
architectures: {
    'cracker': "dope",
}

# Cuboid dimension in cm x,y,z
dimensions: {
    "cracker": [16.403600692749023,21.343700408935547,7.179999828338623],
    "gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059],
    "meat": [10.164673805236816,8.3542995452880859,5.7600898742675781],
    "mustard": [9.6024150848388672,19.130100250244141,5.824894905090332],
    "soup": [6.7659378051757813,10.185500144958496,6.771425724029541],
    "sugar": [9.267730712890625,17.625339508056641,4.5134143829345703],
    "bleach": [10.267730712890625,26.625339508056641,7.5134143829345703],
    "peg_hole": [12.6,3.9,12.6],
    'cube_red':[5,5,5],
    'pudding':[49.47199821472168, 29.923000335693359, 83.498001098632812],
    'alphabet_soup':[8.3555002212524414, 7.1121001243591309, 6.6055998802185059],
    'pallet': [120, 80, 14.5]

}

class_ids: {
    "cracker": 1,
    "gelatin": 2,
    "meat":    3,
    "mustard": 4,
    "soup":    5,
    "sugar":   6,
    "bleach":  7,
    "peg_hole": 8,
    "cube_red": 9,
    'pudding': 10,
    'alphabet_soup': 12,
    'pallet': 13,
}

draw_colors: {
    "cracker": [13, 255, 128],  # green
    "gelatin": [255, 255, 255],  # while
    "meat": [0, 104, 255],  # blue
    "mustard": [217,12, 232],  # magenta
    "soup": [255, 101, 0],  # orange
    "sugar": [232, 222, 12],  # yellow
    "bleach": [232, 222, 12],  # yellow
    "peg_hole": [232, 222, 12],  # yellow
    "cube_red": [255,0,0],
    "pudding": [255,0,0],
    "pallet": [229, 204, 255],
}

# optional: provide a transform that is applied to the pose returned by DOPE
model_transforms: {
#    "cracker": [[ 0,  0,  1,  0],
#                [ 0, -1,  0,  0],
#                [ 1,  0,  0,  0],
#                [ 0,  0,  0,  1]]
}

# optional: if you provide a mesh of the object here, a mesh marker will be
# published for visualization in RViz
# You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb
meshes: {
#    "cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj",
#    "gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj",
#    "meat":    "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj",
#    "mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj",
#    "soup":    "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj",
#    "sugar":   "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj",
#    "bleach":  "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj",
}

# optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.
mesh_scales: {
    "cracker": 0.01,
    "gelatin": 0.01,
    "meat":    0.01,
    "mustard": 0.01,
    "soup":    0.01,
    "sugar":   0.01,
    "bleach":  0.01,
    "pallet":  0.01,
}

# Config params for DOPE
thresh_angle: 0.5
thresh_map: 0.0001
sigma: 3
thresh_points: 0.1

and

camera_info.yaml

image_width: 640
image_height: 480
camera_name: dope_webcam_0
camera_matrix:
  rows: 3
  cols: 3
  data: [321.53, 0, 616.056, 0, 235.879, 616.279, 0, 0, 1]
distortion_model: plumb_bob
distortion_coefficients:
  rows: 1
  cols: 5
  data: [0, 0, 0, 0, 0]
rectification_matrix:
  rows: 3
  cols: 3
  data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
projection_matrix:
  rows: 3
  cols: 4
  data: [321.534, 0, 616.056, 0, 0, 235.879, 616.279, 0, 0, 0, 1, 0]

I have provided you with the rgb image if you would like to reproduce it.

monajalal commented 1 year ago

I also tried on this other image and actually got an error. The image quality seems ok. rgb_image-0474

(dope) mona@ard-gpu-01:~/research/dope/scripts$ python train2/inference.py  --data /home/mona/research/dope/scripts/train2/CRAKER_input/
current working dir:  /home/mona/research/dope/scripts
output is located in out_experiment
videopath:  /home/mona/research/dope/scripts/train2/CRAKER_input/
image_files:  ['/home/mona/research/dope/scripts/train2/CRAKER_input/rgb_image-0474.png']
j is /home/mona/research/dope/scripts/train2/CRAKER_input/rgb_image-0474.png and video_path is: 
{'cracker': '/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth'}
*****
model is:  cracker
config['architectures'] :  {'cracker': 'dope'}
Loading DOPE model '/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth'...
/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
  warnings.warn(msg)
    Model loaded in 3.730365753173828 seconds.
loaded
Ctrl-C to stop
opt data:  /home/mona/research/dope/scripts/train2/CRAKER_input/
i_image is:  0
len(imgs):  1
frame rgb_image-0474.png
frame type is <class 'numpy.ndarray'> and frame length 480
vertex2 size[0]:  9
vertex2 size:  torch.Size([9, 50, 66])
peaks len:  2
peaks len:  2
peaks len:  1
peaks len:  1
peaks len:  1
peaks len:  1
peaks len:  1
peaks len:  1
peaks len:  1
len(all_peaks[-1]):  1
nb_object:  0
all_peaks[-1][nb_object][2]:  0.17209956
objects are:  [[[24.527286052723493, 20.809007834487684], [(150.49666454821275, 179.65313678439105), None, (227.5058116730267, 114.63728967454797), (166.31783097397746, 202.3121531854768), (141.0992805853221, 147.04254498881758), None, (238.36749829886324, 149.34804699866842), None], [(0.014524836353883845, 5.947961316033755), None, (0.09030593212982914, 7.568182579162676), (0.07528098046072895, 5.834366722440402), (0.09024037718890608, 7.305403803038851), None, (0.14436224398914713, 5.6868650829343474), None], 0.17209956, [[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]]]]
Traceback (most recent call last):
  File "/home/mona/research/dope/scripts/train2/inference.py", line 420, in <module>
    dope_node.image_callback(
  File "/home/mona/research/dope/scripts/train2/inference.py", line 229, in image_callback
    results, beliefs = ObjectDetector.detect_object_in_image(
  File "/home/mona/research/dope/scripts/train2/inference/detector.py", line 430, in detect_object_in_image
    detected_objects = ObjectDetector.find_object_poses(vertex2, aff, pnp_solver, config)
  File "/home/mona/research/dope/scripts/train2/inference/detector.py", line 495, in find_object_poses
    cuboid2d = np.copy(points)
  File "<__array_function__ internals>", line 200, in copy
  File "/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/numpy/lib/function_base.py", line 960, in copy
    return array(a, order=order, subok=subok, copy=True)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.

monajalal commented 1 year ago

Here's another failed example

(dope) mona@ard-gpu-01:~/research/dope/scripts$ python train2/inference.py  --data /home/mona/research/dope/scripts/train2/CRAKER_input/
current working dir:  /home/mona/research/dope/scripts
output is located in out_experiment
videopath:  /home/mona/research/dope/scripts/train2/CRAKER_input/
image_files:  ['/home/mona/research/dope/scripts/train2/CRAKER_input/rgb_image-0578.png']
j is /home/mona/research/dope/scripts/train2/CRAKER_input/rgb_image-0578.png and video_path is: 
{'cracker': '/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth'}
*****
model is:  cracker
config['architectures'] :  {'cracker': 'dope'}
Loading DOPE model '/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth'...
/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
  warnings.warn(msg)
    Model loaded in 3.7484540939331055 seconds.
loaded
Ctrl-C to stop
opt data:  /home/mona/research/dope/scripts/train2/CRAKER_input/
i_image is:  0
len(imgs):  1
frame rgb_image-0578.png
frame type is <class 'numpy.ndarray'> and frame length 480
vertex2 size[0]:  9
vertex2 size:  torch.Size([9, 50, 66])
peaks len:  2
peaks len:  3
peaks len:  1
peaks len:  2
peaks len:  2
peaks len:  2
peaks len:  1
peaks len:  2
peaks len:  1
len(all_peaks[-1]):  1
nb_object:  0
all_peaks[-1][nb_object][2]:  0.14351296
objects are:  [[[18.185712728531254, 39.08536894189304], [(115.17971180892843, 338.7598551275025), None, None, None, (91.95717682675497, 299.90573085486346), (186.69301298149816, 262.0359816047784), None, None], [(0.1464781172948242, 4.997589937935652), None, None, None, (0.14896906810366484, 6.879044670624101), (0.0417484347608397, 8.161607967727617), None, None], 0.14351296, [[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]]]]
Traceback (most recent call last):
  File "/home/mona/research/dope/scripts/train2/inference.py", line 420, in <module>
    dope_node.image_callback(
  File "/home/mona/research/dope/scripts/train2/inference.py", line 229, in image_callback
    results, beliefs = ObjectDetector.detect_object_in_image(
  File "/home/mona/research/dope/scripts/train2/inference/detector.py", line 430, in detect_object_in_image
    detected_objects = ObjectDetector.find_object_poses(vertex2, aff, pnp_solver, config)
  File "/home/mona/research/dope/scripts/train2/inference/detector.py", line 495, in find_object_poses
    cuboid2d = np.copy(points)
  File "<__array_function__ internals>", line 200, in copy
  File "/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/numpy/lib/function_base.py", line 960, in copy
    return array(a, order=order, subok=subok, copy=True)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.

rgb_image-0578

TontonTremblay commented 1 year ago

I would highly recommend using the weights we trained as baseline. But as discussed in the paper, the results without using domain randomization is not that great. If training your model is very important, I would recommend to use nvisii to render your data. I never had great success using FAT.

monajalal commented 1 year ago

using the cracker_60.pth I get the results end of this comment. Yes. I need to train my own custom data and before moving on to that, I needed to make sure the training works. So, what data have you used for training of cracker_60.pth? I was under the assumption that you have used the entire FAT dataset for that. Could you please guide what was the data? I need to be able to reproduce your training at least once before moving forward. sample 1 rgb_image-0860_belief sample 2 rgb_image-0017_belief sample 3 rgb_image-0578_belief

sample 1 rgb_image-0860 sample 2 rgb_image-0017 sample 3 rgb_image-0578

TontonTremblay commented 1 year ago

Yeah if you read the paper we used two datasets we generated with UE4. Right now, the way I recommend people using nvisii to generate data, a lot of people have been having success with it. https://github.com/NVlabs/Deep_Object_Pose/tree/master/scripts/nvisii_data_gen

NVlabs / Deep_Object_Pose