But during evaluation, as shown in the /docs/modules/algorithms.md,
policy.start_episode()
obs = env.reset()
horizon = 400
total_return = 0
for step_i in range(horizon):
# get action from policy (calls @get_action)
act = policy(obs)
# play action
next_obs, r, done = env.step(act)
total_return += r
success = env.is_success()["task"]
if done or success:
break
It seems that there is no data preprocessing for input. Shall we add the processing procedure by ourselves? Or Is this procedure included in the policy inference ? Thank you!
Great question! The same operations will happen at eval-time, but in different locations.
Observation processing will happen in the environment (for example here) and observation normalization will occur as part of the RolloutPolicy object that wraps the model during inference (see here).
Very nice work! I have a question about the data pre-processing for image input. I notice that during training, the data will be processed by:
But during evaluation, as shown in the
/docs/modules/algorithms.md
,It seems that there is no data preprocessing for input. Shall we add the processing procedure by ourselves? Or Is this procedure included in the policy inference ? Thank you!