Closed AlvinYH closed 3 weeks ago
Hi, thanks for your attention.
CUDA_VISIBLE_DEVICES=0 python xxx
.FLAME.obj
file for convenience.Let me know if you have any other questions :)
@AlvinYH For question 2, I think you may use the torch 2.0+. I have debugged the code, and find out the solution. That's because on the Line 111-113 in train.py, the self. model is compiled by torch and transferred from DLMesh to an Optimized model and cannot read initialized retarget_pose sequence. I delete these three lines and it works for me. : )
@czh-98 Thanks for your reply! As @Jackiemin233 mentioned, I did use torch 2.0+, and deleting lines 111-113 fixed the bug. Thank you both!
However, when using torch 2.0+, I encountered an inplace operation error:
RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation: [torch.cuda.FloatTensor [25193]], which is output 0 of LinalgVectorNormBackward0, is at version 1; expected version 0 instead.
This error traces back to line 161 in lib/guidance/shape_reg.py
during the computation of the Laplacian smoothness loss. I have since downgraded to torch 1.12 and resumed training. But I'm curious about the cause of this bug and if there is a solution other than downgrading the torch version.
@czh-98 Thanks for your reply! As @Jackiemin233 mentioned, I did use torch 2.0+, and deleting lines 111-113 fixed the bug. Thank you both! However, when using torch 2.0+, I encountered an inplace operation error:
RuntimeError: one of the variables needed for gradient computation has been modified by an in-place operation: [torch.cuda.FloatTensor [25193]], which is output 0 of LinalgVectorNormBackward0, is at version 1; expected version 0 instead.
This error traces back to line 161 inlib/guidance/shape_reg.py
during the computation of the Laplacian smoothness loss. I have since downgraded to torch 1.12 and resumed training. But I'm curious about the cause of this bug and if there is a solution other than downgrading the torch version.
I tried torch 2.0+ and noticed this issue is due to the inplace operation loss[get_flame_vertex_idx()] *= 5
. I modified it to avoid such operations as a+=b
to a=a+b.
Then it should also work for torch 2.0+.
Thank you for publicly releasing your code! However, I encountered several problems while training the model:
mask
anddense face
are not on the same device. I resolved this by moving themask
to the GPU.retarget_pose
attribute in the trainer class does not seem to alter its value. This causes a bug at https://github.com/czh-98/STAR/blob/master/lib/dlmesh.py#L874 becauseretarget_pose
remains None. I'm unsure of the underlying reason, but I fixed this by encapsulating the function that setsretarget_pose
within thedlmesh
class.Thank you!