Commandline Statistic Output

Xiromtz commented 4 years ago

Hello, I don't have a complete grasp over the python interface and the way mlagents-learn really works, but I believe the statistic outputs i get in my terminal are simply - every (x) timesteps, output mean and std of reward. I am not 100% sure, but I believe there is or was an output that is sent whenever an episode ends for the academy i.e. the whole environment. For my current implementation, I don't see any output when my agent is done and a new episode starts. I'd like to have an output that gives statistics like: episode ended after x timesteps, with y cumulative reward, etc. Does this already exist and I did something wrong for this output to not be sent? Or do I have to code an output myself? How would I even do that? I don't see anything in the docs other than the statement that statistics are output in an interval of x timesteps.

awjuliani commented 4 years ago

Hi @Xiromtz

There are many additional statistics about the training process which are recorded and presented in the TensorBoard interface. See here: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Using-Tensorboard.md for how to use it, and the list of statistics it presents. I believe it should meet your needs. If not, please feel free to reopen this issue.

Xiromtz commented 4 years ago

I was thinking on how to correctly formulate the question I originally intended to ask. Due to me being occupied with other work, I can only now get back to this question:

I know about the tensorboard statistics. I have been using them for a while now and they work nicely for most of use-cases. In my opinion, some information is still missing.

My original question was asking for a per-episode feedback on my agent. Using tensorboard, this does not really work as I would like. I believe the tensorboard feedback is based on a continuous environment, whereas my environment is episodic.

In short, the feedback I need is: Rewards per episode, regardless of time-steps. Apparently no statistics on a per-episode basis exist. Only on an average over all time-steps basis. Since my agent already receives a negative reward every time-step, episodic rewards suffice and are easier for me to understand.

Would it be possible for me to somehow edit the way information is output to the csv/tensorboard file?

awjuliani commented 4 years ago

Hi @Xiromtz

Per-episode reward is exactly what is being displayed in the tensorboard statistics.

Xiromtz commented 4 years ago

Well then I'm doing something wrong. The x-axis corresponds to the number of time-steps that have been simulated in sum over all episodes. This is true for both cumulative reward and episode length...

Maybe I understood something completely wrong. I understand that the agent episode/step is something different from the environment/academy. But within the Unity environment with the runtime plugin, there is no way to set environment episodes, but only Agent episodes via agent.Done().

I only have a single agent, so I was assuming that the environment should count one episode as ended once all agents are set to done? How on earth would a single agent time-step be registered as a whole episode?

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

Commandline Statistic Output #3737