cudaMalloc when running on custome data

rorrewang commented 1 month ago

Thank you very much for your outstanding work! I can successfully run the test data you provided. But when I try to run my custom data on your code, I encountered difficulties. I have obtained the mesh, rgb, mask, and depth files separately and changed them to their corresponding formats. Specifically, I saved the depth in Unit16 format and checked the trimesh visulized mesh, and after switching debug==3, all the output looked OK. However, I encountered this problem:

(found) wly@wly:/mnt/PC/CADGS/third_party/FoundationPose$ python run_desk.py 
/home/wly/anaconda3/envs/found/lib/python3.9/site-packages/kornia/feature/lightglue.py:44: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
Warp 1.0.2 initialized:
   CUDA Toolkit 11.5, Driver 12.4
   Devices:
     "cpu"      : "x86_64"
     "cuda:0"   : "NVIDIA GeForce RTX 4060 Ti" (16 GiB, sm_89, mempool enabled)
   Kernel cache:
     /home/wly/.cache/warp/1.0.2
vertices and faces 293224 553772
[<module>()] Mesh loaded successfully with 293224 vertices and 553772 faces.
[__init__()] self.cfg: 
 lr: 0.0001
c_in: 6
zfar: 'Infinity'
debug: null
n_view: 1
run_id: 3wy8qqex
use_BN: true
exp_name: 2024-01-11-20-02-45
n_epochs: 62
save_dir: /home/bowenw/debug/2024-01-11-20-02-45/
use_mask: false
loss_type: pairwise_valid
optimizer: adam
batch_size: 64
crop_ratio: 1.1
enable_amp: true
use_normal: false
max_num_key: null
warmup_step: -1
input_resize:
- 160
- 160
max_step_val: 1000
vis_interval: 1000
weight_decay: 0
normalize_xyz: true
resume_run_id: null
clip_grad_norm: 'Infinity'
lr_epoch_decay: 500
render_backend: nvdiffrast
train_num_pair: 5
lr_decay_epochs:
- 50
n_epochs_warmup: 1
make_pair_online: false
gradient_max_norm: 'Infinity'
max_step_per_epoch: 10000
n_rendering_workers: 1
save_epoch_interval: 100
n_dataloader_workers: 100
split_objects_across_gpus: true
ckpt_dir: /mnt/PC/CADGS/third_party/FoundationPose/learning/training/../../weights/2024-01-11-20-02-45/model_best.pth

[__init__()] self.h5_file:None
[__init__()] Using pretrained model from /mnt/PC/CADGS/third_party/FoundationPose/learning/training/../../weights/2024-01-11-20-02-45/model_best.pth
/mnt/PC/CADGS/third_party/FoundationPose/learning/training/predict_score.py:151: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(ckpt_dir)
[__init__()] init done
[__init__()] welcome
[__init__()] self.cfg: 
 lr: 0.0001
c_in: 6
zfar: .inf
debug: null
w_rot: 0.1
n_view: 1
run_id: null
use_BN: true
rot_rep: axis_angle
ckpt_dir: /mnt/PC/CADGS/third_party/FoundationPose/learning/training/../../weights/2023-10-28-18-33-37/model_best.pth
exp_name: 2023-10-28-18-33-37
save_dir: /tmp/2023-10-28-18-33-37/
loss_type: l2
optimizer: adam
trans_rep: tracknet
batch_size: 64
crop_ratio: 1.2
use_normal: false
BN_momentum: 0.1
max_num_key: null
warmup_step: -1
input_resize:
- 160
- 160
max_step_val: 1000
normal_uint8: false
vis_interval: 1000
weight_decay: 0
n_max_objects: null
normalize_xyz: true
clip_grad_norm: 'Infinity'
rot_normalizer: 0.3490658503988659
trans_normalizer:
- 0.019999999552965164
- 0.019999999552965164
- 0.05000000074505806
max_step_per_epoch: 25000
val_epoch_interval: 10
n_dataloader_workers: 60
enable_amp: true
use_mask: false

[__init__()] self.h5_file:
[__init__()] Using pretrained model from /mnt/PC/CADGS/third_party/FoundationPose/learning/training/../../weights/2023-10-28-18-33-37/model_best.pth
/mnt/PC/CADGS/third_party/FoundationPose/learning/training/predict_pose_refine.py:138: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(ckpt_dir)
[__init__()] init done
/home/wly/anaconda3/envs/found/lib/python3.9/site-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(
[reset_object()] self.diameter:4.272459633658513, vox_size:0.21362298168292565
[reset_object()] self.pts:torch.Size([587, 3])
[reset_object()] reset done
[make_rotation_grid()] cam_in_obs:(42, 4, 4)
[make_rotation_grid()] rot_grid:(252, 4, 4)
num original candidates = 252
num of pose after clustering: 252
[make_rotation_grid()] after cluster, rot_grid:(252, 4, 4)
[make_rotation_grid()] self.rot_grid: torch.Size([252, 4, 4])
[<module>()] estimator initialization done
self.color_files ['/mnt/PC/CADGS/third_party/FoundationPose/demo_data/desk/rgb/0.png', '/mnt/PC/CADGS/third_party/FoundationPose/demo_data/desk/rgb/1.png', '/mnt/PC/CADGS/third_party/FoundationPose/demo_data/desk/rgb/10.png', '/mnt/PC/CADGS/third_party/FoundationPose/demo_data/desk/rgb/100.png', '/mnt/PC/CADGS/third_party/FoundationPose/demo_data/desk/rgb/101.png', '/mnt/PC/CADGS/third_party/FoundationPose/demo_data/desk/rgb/102.png', '/mnt/PC/CADGS/third_party/FoundationPose/demo_data/desk/rgb/103.png', '/mnt/PC/CADGS/third_party/FoundationPose/demo_data/desk/rgb/104.png', '/mnt/PC/CADGS/third_party/FoundationPose/demo_data/desk/rgb/105.png', '/mnt/PC/CADGS/third_party/FoundationPose/demo_data/desk/rgb/106.png']
[<module>()] i:0
[[585.35394   0.      313.     ]
 [  0.      587.7681  176.     ]
 [  0.        0.        1.     ]]
mask (352, 626)
color (352, 626, 3)
depth (352, 626) 5.296 18.844
[register()] Welcome
Module Utils load on device 'cuda:0' took 4.90 ms
[register()] poses:(252, 4, 4)
[register()] after viewpoint, add_errs min:-1.0
[register()] xyz_map shape: (352, 626, 3)
[register()] poses shape: torch.Size([252, 4, 4])
[register()] depth shape: (352, 626)
[register()] K shape: (3, 3)
[register()] rgb shape: (352, 626, 3)
[register()] ob_mask shape: (352, 626)
/home/wly/anaconda3/envs/found/lib/python3.9/site-packages/torch/__init__.py:955: UserWarning: torch.set_default_tensor_type() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:432.)
  _C._set_default_tensor_type(t)
[predict()] ob_in_cams:(252, 4, 4)
[predict()] self.cfg.use_normal:False
[predict()] trans_normalizer:[0.019999999552965164, 0.019999999552965164, 0.05000000074505806], rot_normalizer:0.3490658503988659
[predict()] making cropped data
[make_crop_data_batch()] Welcome make_crop_data_batch
[make_crop_data_batch()] make tf_to_crops done
raster_ctx.cpp_wrapper <nvdiffrast_plugin.RasterizeCRStateWrapper object at 0x785b71c46830>
pos tensor([[[ 4.4728, -4.9335,  9.7525,  9.7543],
         [ 4.4992, -4.9335,  9.7520,  9.7538],
         [ 4.4712, -4.9707,  9.7522,  9.7540],
         ...,
         [-0.2787,  3.5349,  9.3407,  9.3425],
         [-0.2768,  3.5318,  9.3410,  9.3428],
         [-0.2609,  3.5353,  9.3545,  9.3563]],

        [[ 6.5129,  1.2849,  9.7525,  9.7543],
         [ 6.5260,  1.3079,  9.7520,  9.7538],
         [ 6.5441,  1.2650,  9.7522,  9.7540],
         ...,
         [-3.2274,  1.4667,  9.3407,  9.3425],
         [-3.2237,  1.4668,  9.3410,  9.3428],
         [-3.2168,  1.4798,  9.3545,  9.3563]],

        [[ 2.1697,  6.1681,  9.7525,  9.7543],
         [ 2.1563,  6.1911,  9.7520,  9.7538],
         [ 2.2025,  6.1854,  9.7522,  9.7540],
         ...,
         [-2.9180, -2.1314,  9.3407,  9.3425],
         [-2.9162, -2.1283,  9.3410,  9.3428],
         [-2.9219, -2.1184,  9.3545,  9.3563]],

        ...,

        [[-1.0277,  4.8601, 10.6114, 10.6132],
         [-1.0418,  4.8603, 10.6172, 10.6190],
         [-1.0278,  4.8973, 10.6109, 10.6127],
         ...,
         [-0.1882, -3.6620,  9.3266,  9.3284],
         [-0.1880, -3.6588,  9.3272,  9.3290],
         [-0.1497, -3.6616,  9.3372,  9.3390]],

        [[-4.5573,  1.2325, 10.6114, 10.6132],
         [-4.5637,  1.2192, 10.6172, 10.6190],
         [-4.5895,  1.2511, 10.6109, 10.6127],
         ...,
         [ 3.0229, -2.0502,  9.3266,  9.3284],
         [ 3.0203, -2.0487,  9.3272,  9.3290],
         [ 3.0434, -2.0187,  9.3372,  9.3390]],

        [[-3.1935, -3.6507, 10.6114, 10.6132],
         [-3.1843, -3.6640, 10.6172, 10.6190],
         [-3.2257, -3.6693, 10.6109, 10.6127],
         ...,
         [ 3.2383,  1.5480,  9.3266,  9.3284],
         [ 3.2358,  1.5464,  9.3272,  9.3290],
         [ 3.2229,  1.5795,  9.3372,  9.3390]]])
tri tensor([[     0,      3,      2],
        [     1,      3,      0],
        [     3,      1,      4],
        ...,
        [ 72312,  72274, 293223],
        [ 72314,  72312, 293223],
        [176953,  72314, 293223]], dtype=torch.int32)
resolution (160, 160)
ranges tensor([], device='cpu', size=(0, 2), dtype=torch.int32)
peeling_idx -1
Traceback (most recent call last):
  File "/mnt/PC/CADGS/third_party/FoundationPose/run_desk.py", line 71, in <module>
    pose = est.register(K=reader.K, rgb=color, depth=depth, ob_mask=mask, iteration=args.est_refine_iter)
  File "/mnt/PC/CADGS/third_party/FoundationPose/estimater.py", line 223, in register
    poses, vis = self.refiner.predict(mesh=self.mesh, mesh_tensors=self.mesh_tensors, rgb=rgb, depth=depth, K=K, ob_in_cams=poses.data.cpu().numpy(), normal_map=normal_map, xyz_map=xyz_map, glctx=self.glctx, mesh_diameter=self.diameter, iteration=iteration, get_vis=self.debug>=2)
  File "/home/wly/anaconda3/envs/found/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/PC/CADGS/third_party/FoundationPose/learning/training/predict_pose_refine.py", line 184, in predict
    pose_data = make_crop_data_batch(self.cfg.input_resize, B_in_cams, mesh_centered, rgb_tensor, depth_tensor, K, crop_ratio=crop_ratio, normal_map=normal_map, xyz_map=xyz_map_tensor, cfg=self.cfg, glctx=glctx, mesh_tensors=mesh_tensors, dataset=self.dataset, mesh_diameter=mesh_diameter)
  File "/home/wly/anaconda3/envs/found/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/PC/CADGS/third_party/FoundationPose/learning/training/predict_pose_refine.py", line 49, in make_crop_data_batch
    rgb_r, depth_r, normal_r = nvdiffrast_render(K=K, H=H, W=W, ob_in_cams=poseA[b:b+bs], context='cuda', get_normal=cfg['use_normal'], glctx=glctx, mesh_tensors=mesh_tensors, output_size=cfg['input_resize'], bbox2d=bbox2d_ori[b:b+bs], use_light=True, extra=extra)
  File "/mnt/PC/CADGS/third_party/FoundationPose/Utils.py", line 182, in nvdiffrast_render
    rast_out, _ = dr.rasterize(glctx, pos_clip, pos_idx, resolution=np.asarray(output_size))
  File "/home/wly/anaconda3/envs/found/lib/python3.9/site-packages/nvdiffrast/torch/ops.py", line 325, in rasterize
    return _rasterize_func.apply(glctx, pos, tri, resolution, ranges, grad_db, -1)
  File "/home/wly/anaconda3/envs/found/lib/python3.9/site-packages/torch/autograd/function.py", line 574, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/wly/anaconda3/envs/found/lib/python3.9/site-packages/nvdiffrast/torch/ops.py", line 262, in forward
    out, out_db = _get_plugin().rasterize_fwd_cuda(raster_ctx.cpp_wrapper, pos, tri, resolution, ranges, peeling_idx)
RuntimeError: Cuda error: 2[cudaMalloc(&m_gpuPtr, bytes);]

Can you give me some advice on how to solve this bug?Or how can I get more details about the error info?

abhishekmonogram commented 1 month ago

You are running out of GPU memory. Either the mesh that you load in is too heavy(has too many faces or the input image that you pass in is large). If the GPU you are using doesnt have sufficient vRAM then FoundationPose wont work

Couple of things you could try.

Set the shorterside parameter in the demo to something really small like 360 or 480 instead of 720. Shorterside will resize the image according to the number specified. Downside is that the depth map resolution is also lowered, so if your object has intricate details and the foundation pose model relies on those to get the pose, then the pose may be incorrect
Decimate the model mesh if it has too many faces. Use any tool like blender to decimate and reduce the face count.

If either of the two dont work, try upgrading your GPU to one which has more vRAM

rorrewang commented 1 month ago

@abhishekmonogram Yes,you are right . I was so stupid that I forget to check this .I changed my mesh model size from 16mb to 2mb and it works. Thank you so much for your help :)

NVlabs / FoundationPose

cudaMalloc when running on custome data #249