SeanChenxy / HandAvatar

MIT License
58 stars 4 forks source link

Illegal memory access smpl_body.coap.query during render #2

Closed relh closed 1 year ago

relh commented 1 year ago

Not sure if my INSTALL is wrong, but I get this error currently. It seems to occur while rendering, so it may be a problem with leap?

(base) ➜  HandAvatar git:(main) ✗ CUDA_LAUNCH_BLOCKING=1 ./handavatar/scripts/run_hand.sh
------------------ GPU Configurations ------------------
Primary GPUs: [0]
Secondary GPUs: [0]
--------------------------------------------------------
------------------ GPU Configurations ------------------
Primary GPUs: [0]
Secondary GPUs: [0]
--------------------------------------------------------
MANO-HD in Model
upsample mano to  3093
upsample mano to  12337
load network from  handavatar/out/handavatar/interhand/test_Capture0_ROM04_RT_Occlusion/pretrained_model/latest.tar
[Dataset Path] data/InterHand/5
Load annotation data/InterHand/5/InterHand2.6M_5fps_batch1/preprocess/test/Capture0/ROM03_RT_No_Occlusion/anno_cam.pkl
 -- Total Frames: 194
The rendering is saved in handavatar/out/handavatar/interhand/test_Capture0_ROM04_RT_Occlusion/pretrained_model/latest/5/test/Capture0/ROM03_RT_No_Occlusion/ori
0 194
Traceback (most recent call last):
  File "/private/home/relh/HandAvatar/handavatar/run_interhand.py", line 239, in <module>
    run()
  File "/private/home/relh/mambaforge/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/private/home/relh/HandAvatar/handavatar/run_interhand.py", line 162, in run
    net_output = model(**data,
  File "/private/home/relh/mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "handavatar/core/nets/handavatar/network.py", line 542, in forward
    all_ret = self._batchify_rays(packed_ray_infos, **kwargs)
  File "handavatar/core/nets/handavatar/network.py", line 241, in _batchify_rays
    ret = self._render_rays(rays_flat[i:i+cfg.chunk], **kwargs)
  File "handavatar/core/nets/handavatar/network.py", line 299, in _render_rays
    query_result = self._query_mlp(
  File "handavatar/core/nets/handavatar/network.py", line 127, in _query_mlp
    result = self._apply_mlp_kernals(
  File "handavatar/core/nets/handavatar/network.py", line 200, in _apply_mlp_kernals
    alpha, part_info = self.smpl_body.coap.query(xyz[None], smpl_output, ret_intermediate=True)
  File "handavatar/core/nets/handavatar/pairof/pairof_render.py", line 696, in query
    gdists, gindices = (points[:, :, None, :] - global_points[:, None, :, :]).norm(dim=-1).topk(self.neighbor, dim=-1, largest=False)
RuntimeError: CUDA error: an illegal memory access was encountered
relh commented 1 year ago

Looks like this is a PyTorch versioning issue: https://github.com/pytorch/pytorch/issues/82569

EDIT: upgrading from 1.13.1 to 2.0 like the thread mentioned fixed my issue

(Pdb) (points[:, :, None, :] - global_points[:, None, :, :]).norm(dim=-1).topk(self.neighbor, dim=-1, largest=False)
torch.return_types.topk(
values=tensor([[[0.5599, 0.5635, 0.5635, 0.5658],
         [0.5589, 0.5625, 0.5626, 0.5648],
         [0.5579, 0.5615, 0.5616, 0.5638],
         ...,
         [0.1095, 0.1141, 0.1143, 0.1168],
         [0.1036, 0.1049, 0.1066, 0.1079],
         [0.0942, 0.0949, 0.0966, 0.0987]]], device='cuda:0'),
indices=tensor([[[1766, 1400, 2326, 2366],
         [1766, 1400, 2326, 2366],
         [1766, 1400, 2326, 2366],
         ...,
         [2057,  782, 2919, 2132],
         [2057, 2132, 1173,  189],
         [2132, 1173,  189, 3077]]], device='cuda:0'))
(Pdb) --KeyboardInterrupt--