Closed zoezhou1999 closed 1 year ago
I have tried to allocate states inside torch.autograd.Function, it seems make the training more stable but the nan loss sometimes still appear or the optimization is not that perfect as provided examples (by making it supervise on all video trajectory and using Adam optimizer with lr=0.1). This performance is normal or I did something wrong in the code? thank you!
When I set the train_iters to 100000, then at some point, an error is thrown
Warp CUDA error 2: out of memory (/buildAgent/work/3db450722a274445/warp/native/warp.cu:215)
Traceback (most recent call last):
File "example_sim_rigid_contact_grad_check_sample.py", line 253, in <module>
robot.train()
File "example_sim_rigid_contact_grad_check_sample.py", line 203, in train
body_list = ForwardRenderingV3.apply(velocity, self)
File "example_sim_rigid_contact_grad_check_sample.py", line 43, in forward
ctx.states.append(ctx.model.state(requires_grad=True))
File "/mnt/colab_public/datasets/zyhz/conda_env/zyhz/lib/python3.7/site-packages/warp/sim/model.py", line 606, in state
self.body_count, dtype=wp.spatial_vector, device=s.body_q.device, requires_grad=requires_grad
File "/mnt/colab_public/datasets/zyhz/conda_env/zyhz/lib/python3.7/site-packages/warp/context.py", line 2617, in zeros
raise RuntimeError("Memory allocation failed on device: {} for {} bytes".format(device, num_bytes))
RuntimeError: Memory allocation failed on device: cuda:0 for 24 bytes
This memory increasing issue also happens in example_sim_fk_grad_torch.py example.
Hi @zoezhou1999, we recently fixed a memory leak with gradients when propagating back to Torch, please see this commit from @eric-heiden for details: 1de0d850ceeddf191b1718797f1f7dc120ec3e51.
Hi I am trying to optimize the velocity by supervising on video trajectory. I found the performance is not that good as the provided examples. is it normal? thank you!