Closed ejunprung closed 4 years ago
This is great info @ejunprung . It matches with the direction we're going with #639. I'll break this down into to-do items.
Okay, can probably delete the "reward" and "algorithm" boxes as well. Doesn't really provide much value.
I agree with getting rid of algorithm box. Until we have algorithms other than PPO, this information might only trigger questions about trying other algorithms from the user - which we would fall short of giving a great response.
The reward score box is redundant with the graph. It should be visible from the graph if we choose to include the graph.
Things I think users would care to be assured of in the dashboard:
Using latest AI technology (deep reinforcement learning)
Many different trials performed (as shown by PBT graph)
Many episodes considered during learning (we should consider making this visible, it's more meaningful to users than "training iteration")
Generally positive slope and converging of separate trials (as show by graph)
Simulation metrics (this we are lacking)
@kepricon is integrating PBT with the training and updating parts.
Here is a list of UI related changes, split out from here: https://github.com/SkymindIO/pathmind-webapp/issues?q=is%3Aopen+is%3Aissue+label%3Apbt
Mean Reward Graph
PBT executes one continuous run that evolves automatically over time.
Therefore, we don't need the grid below the mean reward graph.
However, we may want to consider explaining what PBT is learning in some way. An example could be a user mousing over the mean reward chart to trace perturbation history.
Policy Export
The output is now a single policy. Need to remove logic for exporting one policy for each grid search trial.
Update the right column on the experiment page
These aren't really necessary anymore. I'd only care to see the reward function and time elapsed.
As for reward scores, I prefer to see it clearly on the graph because its change over time (i.e. shape) is more important than the static number that we provide. Perhaps we can reintroduce this using simulation metrics later.