Question on this boat race demo -- is the agent actually learning to solve the boat race environment? It seems like the running reward that's used as stopping condition doesn't take into account the episodic reward the agent is accumulating. Is this agent simply meant to demonstrate the CampX API?
Question on this boat race demo -- is the agent actually learning to solve the boat race environment? It seems like the running reward that's used as stopping condition doesn't take into account the episodic reward the agent is accumulating. Is this agent simply meant to demonstrate the CampX API?