Why you insert the real world data into ms.sac_state.replay_buffe?

CLeARoboticsLab / ssrl

Learning to Walk from Three Minutes of Real-World Data with Semi-structured Dynamics Models

Apache License 2.0

17 stars 1 forks source link

Why you insert the real world data into ms.sac_state.replay_buffe? #10

Open hdadong opened 3 days ago

hdadong commented 3 days ago

The code is here: https://github.com/CLeARoboticsLab/ssrl/blob/main/ssrl_hardware/ssrl_ros_go1/scripts/train.py#L485 I wonder why you insert the real world data into the ms.sac_state.replay_buffer? In simulation experiments, the ms.sac_state.replay_buffer only store the hallucination rollout generated by the model, not the environment rollout.

hdadong commented 2 days ago

After careful check, I find that in simulation experiments, the ms.sac_state.replay_buffer will also store the newest environment rollout (though clear_model_buffer_after_model_train is True) and use it to update the sac policy, which is not align with the parameter proposed (real to synthetic data ratio=0.06) in the paper.

Hope to response. Thanks! The code is here: https://github.com/CLeARoboticsLab/ssrl/blob/main/ssrl/brax/training/agents/ssrl/train.py#L926

hdadong commented 2 days ago

Another question, why the clear_model_buffer_after_model_train is set to False in the hardware experiments. This mean that it will use all the data collect in the real world to update the sac policy and not the real to synthetic data ratio=0.06.

How much data would be load by execute the load_rollout function? Does it load all the data collected in the hardware experiments?

In the hardware experiments, how many times would be need to execute the /ssrl/ssrl_hardware/ssrl_ros_go1/scripts/train.py to train the policy?