I am using the released code for reproduce the teacher network for in-hand object rotation in simulator, while i am not able to achieve the results list in the paper. In fact , the model just learn to grasp the obejct to prevent the object to fall down, and the rotation reward of the policy have a mean value around 0.6 after training about 4 days. I have not modified anything. i found the released pretrained teacher model preform well in simulation. Could you please help to check if the training hyperparams for the teacher policy. Thanks advance.
Dear authors:
I am using the released code for reproduce the teacher network for in-hand object rotation in simulator, while i am not able to achieve the results list in the paper. In fact , the model just learn to grasp the obejct to prevent the object to fall down, and the rotation reward of the policy have a mean value around 0.6 after training about 4 days. I have not modified anything. i found the released pretrained teacher model preform well in simulation. Could you please help to check if the training hyperparams for the teacher policy. Thanks advance.