I just wanted to clarify something about the learning curve performance figures of your current version of the manuscript. It seems like that TD-MPC2 actually utilizes action repeat=2 and that the steps values in the TD-MPC2 config should actually be halved if you were to perform 10M environment steps. However, according to the default config values for steps in the TD-MPC2 config for humanoid bench, it seems like that you have set steps to 10M.
Were the actually "environment steps" taken into consideration? Or is this something that has been overlooked?
Sorry, it seems like that I totally mistaken the code and that you properly used action repeat=1 for TD-MPC2.
I am closing this issue since there actually wasn't any issue at first :)
I just wanted to clarify something about the learning curve performance figures of your current version of the manuscript. It seems like that TD-MPC2 actually utilizes
action repeat=2
and that thesteps
values in the TD-MPC2 config should actually be halved if you were to perform 10M environment steps. However, according to the default config values forsteps
in the TD-MPC2 config for humanoid bench, it seems like that you have setsteps
to 10M.Were the actually "environment steps" taken into consideration? Or is this something that has been overlooked?
It would be awesome if this were to be clarified.
Best regards, Dongyoon