Open hdadong opened 3 days ago
After careful check, I find that in simulation experiments, the ms.sac_state.replay_buffer will also store the newest environment rollout (though clear_model_buffer_after_model_train is True) and use it to update the sac policy, which is not align with the parameter proposed (real to synthetic data ratio=0.06) in the paper.
Hope to response. Thanks! The code is here: https://github.com/CLeARoboticsLab/ssrl/blob/main/ssrl/brax/training/agents/ssrl/train.py#L926
Another question, why the clear_model_buffer_after_model_train is set to False in the hardware experiments. This mean that it will use all the data collect in the real world to update the sac policy and not the real to synthetic data ratio=0.06.
How much data would be load by execute the load_rollout function? Does it load all the data collected in the hardware experiments?
In the hardware experiments, how many times would be need to execute the /ssrl/ssrl_hardware/ssrl_ros_go1/scripts/train.py to train the policy?
The code is here: https://github.com/CLeARoboticsLab/ssrl/blob/main/ssrl_hardware/ssrl_ros_go1/scripts/train.py#L485 I wonder why you insert the real world data into the ms.sac_state.replay_buffer? In simulation experiments, the ms.sac_state.replay_buffer only store the hallucination rollout generated by the model, not the environment rollout.