ScheiklP / sofa_zoo

Reinforcement learning scripts for sofa_env environments.
MIT License
5 stars 5 forks source link

Rope_threading #6

Closed wjyustl closed 5 months ago

wjyustl commented 5 months ago

https://github.com/ScheiklP/sofa_zoo/assets/129915667/e1542f22-3a98-4b6a-a736-4bbc379401c0

Hi, @ScheiklP The current virtual scenario of Rope_threading is set to: eye_configs = "1" bimanual_grasp = True

I imagine that when the right gripper completes the goal (sending the rope into the loop), the right gripper can stay still while the left gripper grabs. 4

I think this will improve the success rate of the left gripper in reaching the rope. Am I right? How can I suppose to do this?

ScheiklP commented 5 months ago

Hi @wjyustl , since the degrees of freedom are independent, I don't think it would make a large difference. You could try adapting the _do_action function (depending on which one you actually use), and mask out the values of the right gripper, if the active eye's state is TRANSITION.

E.g. https://github.com/ScheiklP/sofa_env/blob/main/sofa_env/scenes/rope_threading/rope_threading_env.py#L375-L382 action[:5] = 0.0

Cheers, Paul

wjyustl commented 5 months ago

Hi, @ScheiklP I modify the code of env according to your suggestion. But there are still some problems. After the right gripper completes the task and remains in position, the left gripper remains open.

https://github.com/ScheiklP/sofa_zoo/assets/129915667/fadb0cc4-c9cc-4f6e-ac23-909a0e4bf527

Here are the changes I made to the code. Snipaste_2023-11-28_16-29-47 Thanks for your reply!

ScheiklP commented 5 months ago

Hi @wjyustl , yes, the reward for actually grasping the rope with the left gripper is sparse. https://github.com/ScheiklP/sofa_env/blob/main/sofa_env/scenes/rope_threading/rope_threading_env.py#L601-L628

The agent gets a dense reward for moving the left gripper to the rope, but actually grasping the rope is rewarded once, when it actually happens.

You could try modifying the reward function to add a feature that rewards closing the left gripper at the right time. Something like "if the rope is between the gripper jaws of the left gripper, when the eye is in state TRANSITION, give positive rewards for closing the gripper, and negative rewards for opening it".

wjyustl commented 5 months ago

@ScheiklP How do I tell if the rope is between the gripper jaws of the left gripper?

ScheiklP commented 5 months ago

Hi @wjyustl, you could try something like this