Length of Logs Are Much Longer than Expected

buildingamind / NewbornEmbodiedTuringTest

A testbed for comparing the learning abilities of newborn animals and autonomous artificial agents.

MIT License

9 stars 1 forks source link

Length of Logs Are Much Longer than Expected #142

Open Zach-Attach opened 1 month ago

Zach-Attach commented 1 month ago

Training and Test Logs record many more values than would be expected. Currently, there are 4 observations around this problem: ~~1. Beginning of episodes initialize twice (2 step 0s)~~ ~~2. First condition runs for 56 steps and restarts, twice at first episode~~ ~~3. Number of episodes are 5x what is expected (e.g. test_episodes=20 results in 100 episodes run)~~

The logs end earlier than expected. It seems that the Unity executable is unable to log all of the episode steps before training completes.
The very first step (0,0) starts 4 times

Zach-Attach commented 2 days ago

is explained by the additional env.reset() that occurs in a vectorized environment.
- This is resolved through the use of a DummyVecEnv rather than using the make_vec_env() function.
is explained by the env validation.
- This is resolved through the addition of a validation mode flag to the Unity executable.
is explained by the decision steps parameter in Unity. Each decision in the environment is made every 5 steps in Unity.
- Solution is to multiply maxSteps by the number of steps per decision in Unity.

Zach-Attach commented 2 days ago

For 4, it seems as if the executable from ml agents and the training code from SB3 are not working in synchrony. The final logged timestep seems to change with each run. Altering the total_timesteps value for model.learn results in the logs also stopping at random timesteps. Setting the buffer size to be larger than the steps_per_episode * train_eps results in code continuing to run until approximately the buffer size. However, it continues to end at a random point before that