MarilynKeller / SKEL

Release for the Siggraph Asia 2023 SKEL paper "From Skin to Skeleton: Towards Biomechanically Accurate 3D Digital Humans".
https://skel.is.tue.mpg.de/
Other
180 stars 21 forks source link

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #14

Open zmhsmart opened 2 months ago

zmhsmart commented 2 months ago

Hello, thank you for your excellent work. I tried to reproduce your work and got the following error. Can you help me solve this problem? Thank you very much. $ python examples/skel_poses.py --gender male Error processing line 1 of /home/pbc/miniconda3/envs/zxh/lib/python3.8/site-packages/psbody-mesh-nspkg.pth:

Traceback (most recent call last): File "/home/pbc/miniconda3/envs/zxh/lib/python3.8/site.py", line 169, in addpackage exec(line) File "", line 1, in File "", line 553, in module_from_spec AttributeError: 'NoneType' object has no attribute 'loader'

Remainder of file ignored Traceback (most recent call last): File "examples/skel_poses.py", line 126, in skel_seq = SKELSequence(skel_layer=skel_model, betas=betas, poses_body=poses, poses_type='skel', File "/home/pbc/project/inference/SKEL/aitviewer-skel/aitviewer/renderables/skel.py", line 144, in init skel_output = self.fk() File "/home/pbc/project/inference/SKEL/aitviewer-skel/aitviewer/renderables/skel.py", line 441, in fk skel_output = self.skel_layer(poses=poses_body, betas=betas, trans=trans, poses_type=self.poses_type) File "/home/pbc/miniconda3/envs/zxh/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/pbc/project/inference/SKEL/skel/skel_model.py", line 312, in forward v_shaped = skin_v0 + torch.matmul(shapedirs, betas).view(B, Ns, 3) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)

MarilynKeller commented 2 months ago

Hi,

Seems like I recently introduced this bug by setting aitviewer to use GPU by default, I need to fix that.

In the meanwhile, you can fix this by setting this parameter to device: "cpu" https://github.com/MarilynKeller/aitviewer-skel/blob/cd4e851e05b00901e9ea9b3201d09f565b0dedf1/aitviewer/aitvconfig.yaml#L18

Or, if you want to use the GPU,

Here: https://github.com/MarilynKeller/SKEL/blob/832f9a589215c934b4fadf1b5d03f1dab5ef7999/examples/skel_poses.py#L126

Move poses, betas and trans to the GPU poses = poses.to("cuda:0") ...

zmhsmart commented 2 months ago

Clip_2024-07-25_20-29-18 Clip_2024-07-25_20-29-31 pbc@wcs-Pre:~/project/inference/SKEL$ python examples/skel_poses.py --gender male Traceback (most recent call last): File "examples/skel_poses.py", line 140, in skel_seq = SKELSequence(skel_layer=skel_model, betas=betas, poses_body=poses, poses_type='skel', File "/home/pbc/project/inference/SKEL/aitviewer-skel/aitviewer/renderables/skel.py", line 144, in init skel_output = self.fk() File "/home/pbc/project/inference/SKEL/aitviewer-skel/aitviewer/renderables/skel.py", line 441, in fk skel_output = self.skel_layer(poses=poses_body, betas=betas, trans=trans, poses_type=self.poses_type) File "/home/pbc/miniconda3/envs/zxh/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/pbc/project/inference/SKEL/skel/skel_model.py", line 511, in forward T = torch.matmul(self.skel_weights_rigid, Gk01.permute(1, 0, 2, 3).contiguous().view(Nj, -1)).view(Nk, B, 4,4).transpose(0, 1) #[1, 48757, 3, 3] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.52 GiB (GPU 0; 23.68 GiB total capacity; 17.44 GiB already allocated; 4.81 GiB free; 17.54 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I first added pose, betas, and trans to the gpu, but the same error was still displayed. Then I added the model skel to the gpu, but the graphics card memory overflowed. My graphics card is 3090, and the video memory is 24g. The error shown in the picture appeared. How should I solve it? Thank you for your reply

MarilynKeller commented 2 months ago

Hi, then set the config to CPU as I suggested. The sequence is simply too long for being loaded as meshes on your GPU, which is fine.