Closed Wangbo-CASIA closed 1 week ago
it is like this: | o | o | | a | a | a | a | we actually use the last 3 actions.
(let me update the config maybe hhh, it makes people confused)
那为啥网络预测的action 依然是(batch, 4, action_dim) 并且我看loss mask也没有对第一个action进行掩码
您好 请教一个问题 dp3.yaml中设置的 horizon: 4 n_obs_steps: 2 n_action_steps: 4 实际获取的每个样本为连续的四帧; 怎么做到 2步观测+4步推理的 (正常我们不应该取连续的6帧吗)