I am trying to set target return to train my DT but code gives a bunch of errors. Adding elements to the target_envs list leads to errors in evaluate_episode_rtg, since the key 'env' is not defined anywhere. Adding other kinds of information about target return in the training pipeline, such as
for iteration in range(last_iter+1, config['max_iters']):
outputs = trainer.train_iteration(num_steps=config['num_steps_per_iter'], iter_num=iteration+1, print_logs=True)
evalrets = [outputs[f'evaluation/target{target_rew}_return_mean'] for target_rew in config['env_targets']]
mean_ret = np.mean(eval_rets)
generates KeyError since, of course, no keys of that kind are generated when calling train_iteration.
I'm asking myself: how the training works if no target return is provided?
Hello,
I am trying to set target return to train my DT but code gives a bunch of errors. Adding elements to the target_envs list leads to errors in evaluate_episode_rtg, since the key 'env' is not defined anywhere. Adding other kinds of information about target return in the training pipeline, such as
for iteration in range(last_iter+1, config['max_iters']): outputs = trainer.train_iteration(num_steps=config['num_steps_per_iter'], iter_num=iteration+1, print_logs=True) evalrets = [outputs[f'evaluation/target{target_rew}_return_mean'] for target_rew in config['env_targets']] mean_ret = np.mean(eval_rets)
generates KeyError since, of course, no keys of that kind are generated when calling train_iteration.
I'm asking myself: how the training works if no target return is provided?
thanks