Is there are explicit rewards/penalty feedback (game score, win or lose sign)?
As I understanding, feedback is unstructured text.
Even If there are game score, that provide within text.
Is this all? Should agent recognze game score, game end, win and lose using text?
In to my knolwedge, there are save, restore (load) and undo commands.
In my oppinion, if agent can use these commands for simulation, we can use search algorithms.
However, for now, agent can't use save and restore commands and I'm not sure undo command perform correctly.
Is there are possible we can use these commands for simulation?