Closed JeremyLinky closed 3 years ago
Should we set MAX_EPISODE_STEPS the same value with T_EXP in the test? It seems that once time step>_MAX_EPISODESTEPS, the algorithm switches to process another episode data in _eval__checkpoint() of occant_exp_trainer.py.
Reset is manually called during training. The idea behind 1001 is that the episode should not end within the simulator before it is reset by the training script (see here). Technically, it can be any value larger than 1000. Nothing special about 1001. This is not done for evaluation though (see here). We rely on the simulator's reset for evaluation.
Hi, I noticed that MAX_EPISODE_STEPS is set to 1001 instead of 1000 in the config file during training, but T_EXP is set to 1000, which is not consistent. However, in the test, MAX_EPISODE_STEPS and T_EXP are the same, both are 500. Could you tell me the reason? Thanks!