Open monajalal opened 1 year ago
This looks promising :P have you tried having the object a little further out from the camera? Can the threshold in the config for something smaller?
I'd recommend first repositioning the camera (or object) so that you look at it from an angle, not frontally. This might help getting more unique corner detections. For example, here are some good detections on my own data:
You can see that in both cases there is only one clear maximum in most belief maps.
On the other hand, here's an example that looks more like your picture. There are multiple maxima, and DOPE fails to recognize the object:
So to debug this, I'd recommend to first find a position of the camera / object so that the belief maps look like image 1 or 2. If you then still don't get any object detection results, it's usually either the dimensions
or (more rarely) the camera intrinsics are wrong.
Once you've gotten your first detection results, and they look halfway correct in 3D, you know that your dimensions
parameter and camera intrinsics are correct, and then you can start evaluating the quality of the detections. On my own data, the quality is not stellar for the cracker object, but good enough for me.
Thank you @TontonTremblay and @mintar for your responses. Sorry for late reply. I didn't have access to camera.
So, I tried with this view and the results are even worse.
My new questions are: 1) shouldn't DOPE detect the object from all views? 2) what is the range of distance for the object to be detected?
config_pose.yaml:
topic_camera: "/dope/webcam/image_raw"
topic_camera_info: "/dope/webcam/camera_info"
topic_publishing: "dope"
input_is_rectified: True # Whether the input image is rectified (strongly suggested!)
downscale_height: 400 # if the input image is larger than this, scale it down to this pixel height
# Comment any of these lines to prevent detection / pose estimation of that object
weights: {
"cracker": "/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth",
}
# Type of neural network architecture
architectures: {
'cracker': "dope",
}
# Cuboid dimension in cm x,y,z
dimensions: {
"cracker": [16.403600692749023,21.343700408935547,7.179999828338623],
"gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059],
"meat": [10.164673805236816,8.3542995452880859,5.7600898742675781],
"mustard": [9.6024150848388672,19.130100250244141,5.824894905090332],
"soup": [6.7659378051757813,10.185500144958496,6.771425724029541],
"sugar": [9.267730712890625,17.625339508056641,4.5134143829345703],
"bleach": [10.267730712890625,26.625339508056641,7.5134143829345703],
"peg_hole": [12.6,3.9,12.6],
'cube_red':[5,5,5],
'pudding':[49.47199821472168, 29.923000335693359, 83.498001098632812],
'alphabet_soup':[8.3555002212524414, 7.1121001243591309, 6.6055998802185059],
'pallet': [120, 80, 14.5]
}
class_ids: {
"cracker": 1,
"gelatin": 2,
"meat": 3,
"mustard": 4,
"soup": 5,
"sugar": 6,
"bleach": 7,
"peg_hole": 8,
"cube_red": 9,
'pudding': 10,
'alphabet_soup': 12,
'pallet': 13,
}
draw_colors: {
"cracker": [13, 255, 128], # green
"gelatin": [255, 255, 255], # while
"meat": [0, 104, 255], # blue
"mustard": [217,12, 232], # magenta
"soup": [255, 101, 0], # orange
"sugar": [232, 222, 12], # yellow
"bleach": [232, 222, 12], # yellow
"peg_hole": [232, 222, 12], # yellow
"cube_red": [255,0,0],
"pudding": [255,0,0],
"pallet": [229, 204, 255],
}
# optional: provide a transform that is applied to the pose returned by DOPE
model_transforms: {
# "cracker": [[ 0, 0, 1, 0],
# [ 0, -1, 0, 0],
# [ 1, 0, 0, 0],
# [ 0, 0, 0, 1]]
}
# optional: if you provide a mesh of the object here, a mesh marker will be
# published for visualization in RViz
# You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb
meshes: {
# "cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj",
# "gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj",
# "meat": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj",
# "mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj",
# "soup": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj",
# "sugar": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj",
# "bleach": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj",
}
# optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.
mesh_scales: {
"cracker": 0.01,
"gelatin": 0.01,
"meat": 0.01,
"mustard": 0.01,
"soup": 0.01,
"sugar": 0.01,
"bleach": 0.01,
"pallet": 0.01,
}
# Config params for DOPE
thresh_angle: 0.5
thresh_map: 0.0001
sigma: 3
thresh_points: 0.1
and
camera_info.yaml
image_width: 640
image_height: 480
camera_name: dope_webcam_0
camera_matrix:
rows: 3
cols: 3
data: [321.53, 0, 616.056, 0, 235.879, 616.279, 0, 0, 1]
distortion_model: plumb_bob
distortion_coefficients:
rows: 1
cols: 5
data: [0, 0, 0, 0, 0]
rectification_matrix:
rows: 3
cols: 3
data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
projection_matrix:
rows: 3
cols: 4
data: [321.534, 0, 616.056, 0, 0, 235.879, 616.279, 0, 0, 0, 1, 0]
I have provided you with the rgb image if you would like to reproduce it.
I also tried on this other image and actually got an error. The image quality seems ok.
(dope) mona@ard-gpu-01:~/research/dope/scripts$ python train2/inference.py --data /home/mona/research/dope/scripts/train2/CRAKER_input/
current working dir: /home/mona/research/dope/scripts
output is located in out_experiment
videopath: /home/mona/research/dope/scripts/train2/CRAKER_input/
image_files: ['/home/mona/research/dope/scripts/train2/CRAKER_input/rgb_image-0474.png']
j is /home/mona/research/dope/scripts/train2/CRAKER_input/rgb_image-0474.png and video_path is:
{'cracker': '/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth'}
*****
model is: cracker
config['architectures'] : {'cracker': 'dope'}
Loading DOPE model '/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth'...
/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
warnings.warn(msg)
Model loaded in 3.730365753173828 seconds.
loaded
Ctrl-C to stop
opt data: /home/mona/research/dope/scripts/train2/CRAKER_input/
i_image is: 0
len(imgs): 1
frame rgb_image-0474.png
frame type is <class 'numpy.ndarray'> and frame length 480
vertex2 size[0]: 9
vertex2 size: torch.Size([9, 50, 66])
peaks len: 2
peaks len: 2
peaks len: 1
peaks len: 1
peaks len: 1
peaks len: 1
peaks len: 1
peaks len: 1
peaks len: 1
len(all_peaks[-1]): 1
nb_object: 0
all_peaks[-1][nb_object][2]: 0.17209956
objects are: [[[24.527286052723493, 20.809007834487684], [(150.49666454821275, 179.65313678439105), None, (227.5058116730267, 114.63728967454797), (166.31783097397746, 202.3121531854768), (141.0992805853221, 147.04254498881758), None, (238.36749829886324, 149.34804699866842), None], [(0.014524836353883845, 5.947961316033755), None, (0.09030593212982914, 7.568182579162676), (0.07528098046072895, 5.834366722440402), (0.09024037718890608, 7.305403803038851), None, (0.14436224398914713, 5.6868650829343474), None], 0.17209956, [[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]]]]
Traceback (most recent call last):
File "/home/mona/research/dope/scripts/train2/inference.py", line 420, in <module>
dope_node.image_callback(
File "/home/mona/research/dope/scripts/train2/inference.py", line 229, in image_callback
results, beliefs = ObjectDetector.detect_object_in_image(
File "/home/mona/research/dope/scripts/train2/inference/detector.py", line 430, in detect_object_in_image
detected_objects = ObjectDetector.find_object_poses(vertex2, aff, pnp_solver, config)
File "/home/mona/research/dope/scripts/train2/inference/detector.py", line 495, in find_object_poses
cuboid2d = np.copy(points)
File "<__array_function__ internals>", line 200, in copy
File "/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/numpy/lib/function_base.py", line 960, in copy
return array(a, order=order, subok=subok, copy=True)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.
Here's another failed example
(dope) mona@ard-gpu-01:~/research/dope/scripts$ python train2/inference.py --data /home/mona/research/dope/scripts/train2/CRAKER_input/
current working dir: /home/mona/research/dope/scripts
output is located in out_experiment
videopath: /home/mona/research/dope/scripts/train2/CRAKER_input/
image_files: ['/home/mona/research/dope/scripts/train2/CRAKER_input/rgb_image-0578.png']
j is /home/mona/research/dope/scripts/train2/CRAKER_input/rgb_image-0578.png and video_path is:
{'cracker': '/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth'}
*****
model is: cracker
config['architectures'] : {'cracker': 'dope'}
Loading DOPE model '/hdd/data/DOPE/FAT/DOPE_cracker_60epochs.pth'...
/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
warnings.warn(
/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
warnings.warn(msg)
Model loaded in 3.7484540939331055 seconds.
loaded
Ctrl-C to stop
opt data: /home/mona/research/dope/scripts/train2/CRAKER_input/
i_image is: 0
len(imgs): 1
frame rgb_image-0578.png
frame type is <class 'numpy.ndarray'> and frame length 480
vertex2 size[0]: 9
vertex2 size: torch.Size([9, 50, 66])
peaks len: 2
peaks len: 3
peaks len: 1
peaks len: 2
peaks len: 2
peaks len: 2
peaks len: 1
peaks len: 2
peaks len: 1
len(all_peaks[-1]): 1
nb_object: 0
all_peaks[-1][nb_object][2]: 0.14351296
objects are: [[[18.185712728531254, 39.08536894189304], [(115.17971180892843, 338.7598551275025), None, None, None, (91.95717682675497, 299.90573085486346), (186.69301298149816, 262.0359816047784), None, None], [(0.1464781172948242, 4.997589937935652), None, None, None, (0.14896906810366484, 6.879044670624101), (0.0417484347608397, 8.161607967727617), None, None], 0.14351296, [[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None], [None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]]]]
Traceback (most recent call last):
File "/home/mona/research/dope/scripts/train2/inference.py", line 420, in <module>
dope_node.image_callback(
File "/home/mona/research/dope/scripts/train2/inference.py", line 229, in image_callback
results, beliefs = ObjectDetector.detect_object_in_image(
File "/home/mona/research/dope/scripts/train2/inference/detector.py", line 430, in detect_object_in_image
detected_objects = ObjectDetector.find_object_poses(vertex2, aff, pnp_solver, config)
File "/home/mona/research/dope/scripts/train2/inference/detector.py", line 495, in find_object_poses
cuboid2d = np.copy(points)
File "<__array_function__ internals>", line 200, in copy
File "/home/mona/anaconda3/envs/dope/lib/python3.9/site-packages/numpy/lib/function_base.py", line 960, in copy
return array(a, order=order, subok=subok, copy=True)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (9,) + inhomogeneous part.
I would highly recommend using the weights we trained as baseline. But as discussed in the paper, the results without using domain randomization is not that great. If training your model is very important, I would recommend to use nvisii to render your data. I never had great success using FAT.
using the cracker_60.pth I get the results end of this comment. Yes. I need to train my own custom data and before moving on to that, I needed to make sure the training works. So, what data have you used for training of cracker_60.pth? I was under the assumption that you have used the entire FAT dataset for that. Could you please guide what was the data? I need to be able to reproduce your training at least once before moving forward. sample 1 sample 2 sample 3
sample 1 sample 2 sample 3
Yeah if you read the paper we used two datasets we generated with UE4. Right now, the way I recommend people using nvisii to generate data, a lot of people have been having success with it. https://github.com/NVlabs/Deep_Object_Pose/tree/master/scripts/nvisii_data_gen
I trained DOPE on the entire FAT dataset for the cracker object and took this photo with Intel RealSense camera and the results are not very good and no object is detected besides some heatmaps that are somewhat detected. Any pointers what can be changed in inference code?
Any help is really appreciated.