YuelangX / Gaussian-Head-Avatar

[CVPR 2024] Official repository for "Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians"
Other
757 stars 47 forks source link

Nan issue #25

Open twofatcat opened 4 months ago

twofatcat commented 4 months ago

Hi Dr.Yue, I am new in this area. And I have complied the previous steps. When trying your 'train_gaussianhead.py', error occurs: "CUDA error: an illegal memory access was encountered". Then I localized the problem was in "CameraModule.py", line 189, in render_gaussian.

Next I check the input of render_gaussian using """ print(f"\nmeans3D shape: {means3D.shape}, means2D shape: {means2D.shape}") print(f"\ncolors_precomp shape: {colors_precomp[b].shape}, opacities shape: {opacity[b].shape}") print(f"\nscales shape: {scales[b].shape}, rotations shape: {rotations[b].shape}")

        # 打印输入数据的一部分,确保数据有效
        print("\nmeans3D sample:", means3D[:2])
        print("\nmeans2D sample:", means2D[:2])
        print("\ncolors_precomp sample:", colors_precomp[b][:2])
        print("\nopacities sample:", opacity[b][:2])
        print("\nscales sample:", scales[b][:2])
        print("\nrotations sample:", rotations[b][:2])
        torch.cuda.synchronize()

""", I found this: """ means3D shape: torch.Size([143961, 3]), means2D shape: torch.Size([143961, 3])

colors_precomp shape: torch.Size([143961, 32]), opacities shape: torch.Size([143961, 1])

scales shape: torch.Size([143961, 3]), rotations shape: torch.Size([143961, 4])

means3D sample: tensor([[ 0.0186, -0.0408, -0.1247], [ 0.0231, -0.0348, -0.1314]], device='cuda:0', grad_fn=)

means2D sample: tensor([[0., 0., 0.], [0., 0., 0.]], device='cuda:0', grad_fn=)

colors_precomp sample: tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]], device='cuda:0', grad_fn=)

opacities sample: tensor([[nan], [nan]], device='cuda:0', grad_fn=)

scales sample: tensor([[nan, nan, nan], [nan, nan, nan]], device='cuda:0', grad_fn=)

rotations sample: tensor([[0., nan, nan, nan], [0., nan, nan, nan]], device='cuda:0', grad_fn=) """ There are lots of Nan. Is this normal? Could you please help me with the solution? Thanks a lot.

jeb0813 commented 4 months ago

I met same error, try to train meshhead again or rollback meshhead checkpoint works for me. I have no idea about the cause of this error, it happens sometimes. :D