Open YunchuZhang opened 2 years ago
The network is similar to policy in RL, instead of optimized by approximated policy gradient or gradient from Critic function as RL, we directly optimize it with gradient from simulator. Please refers to: https://github.com/taichi-dev/difftaichi for more detail The second question: No, our simulator is kinematic based without explicit dynamics, the action is the endeffector's velocity actually. If you refers to internal force that computed inside MPM, it is possible but may need to consult more document regarding MPM.
As title. I found in torch_nn it is trying to train a small network that maps states to actions. What dose it used for?
And is that possible to add force information in the environment state?
Thanks