Measuring performance of the agent

The agent interacts with the environment to learn how to solve a classification problem. If we let the agent learn for N episodes, how can we track its performance? How do we test it after training is done?

Here are several performance metrics that we can track during training (per episode):

Final training accuracy and training loss.
Total number of actions vs effective number of training steps.
Validation accuracy in the hold out set.
Execution time (this is not in direct control of the agent, but good to track).

After the agent finished training, we can evaluate a the resulting classifier in several ways:

Make the agent train a classifier from scratch and validate it in a test set (a second hold out).
Take the best classifier trained by the agent during the entire training session and evaluate it in the test set.
Make the agent train a classifier in another (related) problem and evaluate it there.

This issue is open for discussions. Alternatives or extensions are welcome!

broadinstitute / AutoTrain

Measuring performance of the agent #10