NVlabs / FoundationPose

[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
https://nvlabs.github.io/FoundationPose/
Other
955 stars 99 forks source link

argument 'depth' expects an array with 2 dimension(s) but the passed array has 3 dimension(s) #86

Closed shzcuber closed 3 weeks ago

shzcuber commented 3 weeks ago

Hi! I'm currently trying to run run_demo.py with custom data, which is shown here: https://drive.google.com/drive/folders/1kgTLsdPFzi3turJK_z32xSxiX7kVVLKg?usp=drive_link.

I'm using a mesh file provided in the demo reference images (the cheezit box one, 2nd one in the 16 reference images one).

I'm getting the below error when running the code. I think it's an issue with my depth files, because if I replace them with blank png's as in the demo data, it works but not correctly. Does anyone know the issue with my data here?


[__init__()] Using pretrained model from /home/donghun/shawn/FoundationPose/learning/training/../../weights/2023-10-28-18-33-37/model_best.pth
[__init__()] init done
[reset_object()] self.diameter:0.26214806661379636, vox_size:0.013107403330689818
[reset_object()] self.pts:torch.Size([1095, 3])
[reset_object()] reset done
[make_rotation_grid()] cam_in_obs:(42, 4, 4)
[make_rotation_grid()] rot_grid:(252, 4, 4)
num original candidates = 252
num of pose after clustering: 252
[make_rotation_grid()] after cluster, rot_grid:(252, 4, 4)
[make_rotation_grid()] self.rot_grid: torch.Size([252, 4, 4])
[<module>()] estimator initialization done
[<module>()] i:0 cheezit_data/cheezit_box/rgb/1.png
[register()] Welcome
Module Utils load on device 'cuda:0' took 431.33 ms
Traceback (most recent call last):
  File "run_demo.py", line 52, in <module>
    pose = est.register(K=reader.K, rgb=color, depth=depth, ob_mask=mask, iteration=args.est_refine_iter)
  File "/home/donghun/shawn/FoundationPose/estimater.py", line 173, in register
    depth = erode_depth(depth, radius=2, device='cuda')
  File "/home/donghun/shawn/FoundationPose/Utils.py", line 390, in erode_depth
    wp.launch(kernel=erode_depth_kernel, device=device, dim=[depth.shape[0], depth.shape[1]], inputs=[depth_wp, out_wp, radius, depth_diff_thres, ratio_thres, zfar],)
  File "/opt/conda/envs/my/lib/python3.8/site-packages/warp/context.py", line 4218, in launch
    pack_args(fwd_args, params)
  File "/opt/conda/envs/my/lib/python3.8/site-packages/warp/context.py", line 4190, in pack_args
    params.append(pack_arg(kernel, arg_type, arg_name, a, device, adjoint))
  File "/opt/conda/envs/my/lib/python3.8/site-packages/warp/context.py", line 3950, in pack_arg
    raise RuntimeError(
RuntimeError: Error launching kernel 'erode_depth_kernel', argument 'depth' expects an array with 2 dimension(s) but the passed array has 3 dimension(s).```
sabinecelina commented 3 weeks ago

Got the same error, I reference to #25 Seems like your depth image has the wrong uint scale.

utsavrai commented 6 days ago

I am getting the same error

python run_demo.py 
Warp 1.0.2 initialized:
   CUDA Toolkit 11.5, Driver 12.3
   Devices:
     "cpu"      : "x86_64"
     "cuda:0"   : "NVIDIA GeForce RTX 4060 Laptop GPU" (8 GiB, sm_89, mempool enabled)
   Kernel cache:
     /home/utsav/.cache/warp/1.0.2
[__init__()] self.cfg: 
 lr: 0.0001
c_in: 6
zfar: 'Infinity'
debug: null
n_view: 1
run_id: 3wy8qqex
use_BN: true
exp_name: 2024-01-11-20-02-45
n_epochs: 62
save_dir: /home/bowenw/debug/2024-01-11-20-02-45/
use_mask: false
loss_type: pairwise_valid
optimizer: adam
batch_size: 64
crop_ratio: 1.1
enable_amp: true
use_normal: false
max_num_key: null
warmup_step: -1
input_resize:
- 160
- 160
max_step_val: 1000
vis_interval: 1000
weight_decay: 0
normalize_xyz: true
resume_run_id: null
clip_grad_norm: 'Infinity'
lr_epoch_decay: 500
render_backend: nvdiffrast
train_num_pair: 5
lr_decay_epochs:
- 50
n_epochs_warmup: 1
make_pair_online: false
gradient_max_norm: 'Infinity'
max_step_per_epoch: 10000
n_rendering_workers: 1
save_epoch_interval: 100
n_dataloader_workers: 100
split_objects_across_gpus: true
ckpt_dir: /home/utsav/IProject/FoundationPose/learning/training/../../weights/2024-01-11-20-02-45/model_best.pth

[__init__()] self.h5_file:None
[__init__()] Using pretrained model from /home/utsav/IProject/FoundationPose/learning/training/../../weights/2024-01-11-20-02-45/model_best.pth
[__init__()] init done
[__init__()] welcome
[__init__()] self.cfg: 
 lr: 0.0001
c_in: 6
zfar: .inf
debug: null
w_rot: 0.1
n_view: 1
run_id: null
use_BN: true
rot_rep: axis_angle
ckpt_dir: /home/utsav/IProject/FoundationPose/learning/training/../../weights/2023-10-28-18-33-37/model_best.pth
exp_name: 2023-10-28-18-33-37
save_dir: /tmp/2023-10-28-18-33-37/
loss_type: l2
optimizer: adam
trans_rep: tracknet
batch_size: 64
crop_ratio: 1.2
use_normal: false
BN_momentum: 0.1
max_num_key: null
warmup_step: -1
input_resize:
- 160
- 160
max_step_val: 1000
normal_uint8: false
vis_interval: 1000
weight_decay: 0
n_max_objects: null
normalize_xyz: true
clip_grad_norm: 'Infinity'
rot_normalizer: 0.3490658503988659
trans_normalizer:
- 0.019999999552965164
- 0.019999999552965164
- 0.05000000074505806
max_step_per_epoch: 25000
val_epoch_interval: 10
n_dataloader_workers: 60
enable_amp: true
use_mask: false

[__init__()] self.h5_file:
[__init__()] Using pretrained model from /home/utsav/IProject/FoundationPose/learning/training/../../weights/2023-10-28-18-33-37/model_best.pth
[__init__()] init done
[reset_object()] self.diameter:0.013781790921357066, vox_size:0.003
[reset_object()] self.pts:torch.Size([35, 3])
[reset_object()] reset done
[make_rotation_grid()] cam_in_obs:(42, 4, 4)
[make_rotation_grid()] rot_grid:(252, 4, 4)
num original candidates = 252
num of pose after clustering: 252
[make_rotation_grid()] after cluster, rot_grid:(252, 4, 4)
[make_rotation_grid()] self.rot_grid: torch.Size([252, 4, 4])
[<module>()] estimator initialization done
[<module>()] i:0
[register()] Welcome
Module Utils load on device 'cuda:0' took 5.51 ms
Traceback (most recent call last):
  File "/home/utsav/IProject/FoundationPose/run_demo.py", line 52, in <module>
    pose = est.register(K=reader.K, rgb=color, depth=depth, ob_mask=mask, iteration=args.est_refine_iter)
  File "/home/utsav/IProject/FoundationPose/estimater.py", line 173, in register
    depth = erode_depth(depth, radius=2, device='cuda')
  File "/home/utsav/IProject/FoundationPose/Utils.py", line 390, in erode_depth
    wp.launch(kernel=erode_depth_kernel, device=device, dim=[depth.shape[0], depth.shape[1]], inputs=[depth_wp, out_wp, radius, depth_diff_thres, ratio_thres, zfar],)
  File "/home/utsav/anaconda3/envs/foundationpose/lib/python3.9/site-packages/warp/context.py", line 4240, in launch
    pack_args(fwd_args, params)
  File "/home/utsav/anaconda3/envs/foundationpose/lib/python3.9/site-packages/warp/context.py", line 4212, in pack_args
    params.append(pack_arg(kernel, arg_type, arg_name, a, device, adjoint))
  File "/home/utsav/anaconda3/envs/foundationpose/lib/python3.9/site-packages/warp/context.py", line 3972, in pack_arg
    raise RuntimeError(
RuntimeError: Error launching kernel 'erode_depth_kernel', argument 'depth' expects an array with 2 dimension(s) but the passed array has 3 dimension(s).

The model, pointcloud and depth are in mm scale, please find the attached link to my files https://drive.google.com/drive/folders/1N6HkHg1ASygzplz7SEjm5KxJSVvV5jMC?usp=sharing

Since this is an surgical application, I have to use the scale in mm.