Open Zach-Attach opened 1 month ago
For 4, it seems as if the executable from ml agents and the training code from SB3 are not working in synchrony. The final logged timestep seems to change with each run. Altering the total_timesteps value for model.learn results in the logs also stopping at random timesteps. Setting the buffer size to be larger than the steps_per_episode * train_eps results in code continuing to run until approximately the buffer size. However, it continues to end at a random point before that
Training and Test Logs record many more values than would be expected. Currently, there are 4 observations around this problem:
1. Beginning of episodes initialize twice (2 step 0s)2. First condition runs for 56 steps and restarts, twice at first episode3. Number of episodes are 5x what is expected (e.g. test_episodes=20 results in 100 episodes run)