Closed Xianqi-Zhang closed 1 month ago
Hi! I think you are getting confused by the order of joints and actuators in the xml. Try the following code in the test_env.py
file:
action_raw = np.zeros(env.action_space.shape)
joint_to_action = {}
offset = 0
for i in range(env.model.njnt):
print(f"joint {i}: {env.model.joint(i).name}")
jnt_name = env.model.joint(i).name
if jnt_name.startswith("free"):
joint_to_action[env.model.joint(i).name] = range(i + offset, i + offset + 7)
offset += 6
else:
joint_to_action[env.model.joint(i).name] = i + offset
for i in range(env.model.nu):
print(f"actuator {i}: {env.model.actuator(i).name}")
act_name = env.model.actuator(i).name
if act_name.startswith("lh") or act_name.startswith("rh"):
action_raw[i] = 0.0
else:
action_raw[i] = env.data.qpos[joint_to_action[act_name]]
Then, you just need to normalize:
action = env.task.normalize_action(action_raw)
Note that I am setting the hand joints to zero above.
Hi, Thank you for your reply. I have test you code and find that difference between joints order and actuators order (i.e., right arm part).
action_src -> env -> qpos -> action_rec (recoverd from qpos)
With your code, action_rec is almost same as action_src, e.g.,
action_rec
[ 0.006953 0.002198 -0.031055 -0.078634 -0.321824 0.017591 0.017056
-0.032843 -0.076613 -0.335486 0.001403 0.00309 -0.802331 -0.550045
-0.353914 -0.828963 -0.002263 0.804218 0.545445 -0.350416 -0.816377
0.5 0.176471 0. -1. 0. 0. -0.714287
0. -0.714287 -1. 0. -0.714287 -1. 0.
-0.714287 -1. -1. 0. -0.714287 -1. 0.5
0.176471 0. -1. 0. 0. -0.714287 0.
-0.714287 -1. 0. -0.714287 -1. 0. -0.714287
-1. -1. 0. -0.714287 -1. ]
action_src
[ 0.00618 0.002033 -0.031263 -0.078371 -0.321913 0.017153 0.016218
-0.031071 -0.086236 -0.326936 0.001373 0.003056 -0.802221 -0.550039
-0.354205 -0.828961 -0.00232 0.804327 0.545481 -0.350839 -0.816365
0.5 0.176471 0. -1. 0. 0. -0.714287
0. -0.714287 -1. 0. -0.714287 -1. 0.
-0.714287 -1. -1. 0. -0.714287 -1. 0.5
0.176471 0. -1. 0. 0. -0.714287 0.
-0.714287 -1. 0. -0.714287 -1. 0. -0.714287
-1. -1. 0. -0.714287 -1. ]
But for actions random selected, they are different,
action_src = env.action_space.sample()
action_src[21:] = 0 # * Set 0 for hands pos.
action_rec
[ 0.001173 0.051781 -0.027568 -0.031047 -0.427927 0.218858 -0.009215
-0.03604 -0.111994 -0.286872 0.061891 -0.00056 -0.791568 -0.553081
-0.333942 -0.84509 -0.003077 0.780766 0.539807 -0.333298 -0.775451
0.5 0.176471 0. -1. 0. 0. -0.714287
0. -0.714287 -1. 0. -0.714287 -1. 0.
-0.714287 -1. -1. 0. -0.714287 -1. 0.5
0.176471 0. -1. 0. 0. -0.714287 0.
-0.714287 -1. 0. -0.714287 -1. 0. -0.714287
-1. -1. 0. -0.714287 -1. ]
action_rec_wo_normalize
[ 0.000504 0.022266 -0.383155 0.859141 -0.47241 0.094109 -0.003962
-0.407172 0.765647 -0.374376 0.145444 -0.001606 0.019545 -0.015109
0.035492 -0.016778 -0.00883 -0.038179 -0.023056 0.036735 0.043112
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. ]
action_src
[-0.771802 -0.42334 0.136083 0.187762 -0.37183 0.73146 0.620496
-0.219878 -0.832228 0.335425 0.427085 0.967805 -0.573363 0.306545
0.437086 -0.923811 -0.961498 -0.128054 -0.146651 0.32453 -0.109606
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. ]
I also test random select actions and normalize them before execute, they still different,
action_src = env.action_space.sample()
action_src[21:] = 0
action_src = env.task.normalize_action(action)
action_rec
[ 0.107697 -0.016432 0.005941 -0.224225 -0.207612 0.193516 -0.042482
0.018831 -0.270997 -0.214824 -0.01888 0.014548 -0.793247 -0.556336
-0.335144 -0.85398 -0.008045 0.819778 0.555294 -0.333 -0.802578
0.5 0.176471 0. -1. 0. 0. -0.714287
0. -0.714287 -1. 0. -0.714287 -1. 0.
-0.714287 -1. -1. 0. -0.714287 -1. 0.5
0.176471 0. -1. 0. 0. -0.714287 0.
-0.714287 -1. 0. -0.714287 -1. 0. -0.714287
-1. -1. 0. -0.714287 -1. ]
action_src
[ 2.051246 -0.176114 0.136142 -1.52508 0.544188 1.317612 -0.113427
0.443373 -1.526157 -0.269055 -0.146196 0.122162 -0.744831 -0.780355
0.128492 -1.786527 -0.065511 1.003311 0.624341 -0.389015 -0.638936
0.5 0.176471 0. -1. 0. 0. -0.714287
0. -0.714287 -1. 0. -0.714287 -1. 0.
-0.714287 -1. -1. 0. -0.714287 -1. 0.5
0.176471 0. -1. 0. 0. -0.714287 0.
-0.714287 -1. 0. -0.714287 -1. 0. -0.714287
-1. -1. 0. -0.714287 -1. ]
Two questions:
Thanks for your any reply. Best regards.
The code I posted above is the correct way to control the body qpos at their current positions. The initial keyframe is not passively stable, that is why the robot eventually falls down.
Thank you.
Hi, Thanks for your sharing. I am very sorry to bother you so many times, and I have been following this work for a long time.
My work requires to explicitly estimate the residual of fine-tuning from the current pose, so I need to convert qpos into action. Your previous reply said that the robot in then env is position controlled, so I tried to select the body-related variables in qpos as actions, but it doesn't seem right.
In order to achieve this function (qpos -> action), I first execute some actions, record data, and then execute actions generated by qpos, as follows,
I also tried adding normalize_action()/unnormalize_action(), but it seems not right, because the two simulation results are very different.
Test settings:
So the test env use ObservationWrapper, and the action is unnormalized before executed. https://github.com/carlosferrazza/humanoid-bench/blob/376aec79a85ea5df855d1e0e72b00c750baafd7b/humanoid_bench/wrappers.py#L564-L565 https://github.com/carlosferrazza/humanoid-bench/blob/376aec79a85ea5df855d1e0e72b00c750baafd7b/humanoid_bench/tasks.py#L59-L61 I think the qpos-action need to be normalized (as the oppsite of unnormalize), but the visualized results of unormalize_action seems better than normalize_action. This confuses me.
A frame of my test seq. (Frame 05) Src action.
Directly-qpos
Normalized-qpos
Unnormalized-qpos
I have no idea about:
Thanks for your any reply.
Best regards.