Unexpected getSteps Behavior with Python API

PeterKeffer commented 3 years ago

Describe the bug I'm getting weird obs back. Sometimes I'm getting only 1 TerminalSteps und 0 DecisionSteps, sometimes 2 TS and 0 DS and the even worse part, I'm getting sometimes 12 (when having a total of 12 Agents) DecisionSteps AND 1 TerminalStep. I saw this behavior in 3DBall and my own Environment.

I don't think, that this was the case in October last year...? Is this a bug, or is this expected? It makes so many things a lot harder and more complex to program.

EDIT: I've just seen, that for GridWorld there is a different behavior.

To Reproduce Go to the Colab 01 Example Select 3DBall (Or probably any other Env) Add print("decision_steps: " + str(len(decision_steps)) + " terminal_steps: " + str(len(terminal_steps))) to the loop

Console logs / stack traces

decision_steps: 12 terminal_steps: 0
decision_steps: 12 terminal_steps: 1
decision_steps: 12 terminal_steps: 0
decision_steps: 12 terminal_steps: 0
decision_steps: 12 terminal_steps: 0
decision_steps: 12 terminal_steps: 0
decision_steps: 0 terminal_steps: 2
decision_steps: 0 terminal_steps: 1
decision_steps: 12 terminal_steps: 0
decision_steps: 0 terminal_steps: 1
Total rewards for episode 0 is 0.8000000268220901
decision_steps: 12 terminal_steps: 0

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Unity Version: Unity 2020.2.5f1 and 2019.4.19f1
OS + version: Windows 10 and macOS
ML-Agents version: 0.23
Torch version: not relevant, I'm using Python API
Environment: 3DBall

andrewcoh commented 3 years ago

Thanks for raising this. The reason for this is that agents can terminate in between decision intervals (set in the DecisionRequester component) but receive decision steps on the decision interval. I do not believe it is a bug but perhaps we can clean this up a bit so that the behavior is more predictable.

Would you mind elaborating on what this is complicating for you?

PeterKeffer commented 3 years ago

I found it very exhausting to write a proper, simple yet modular ReplayBuffer for this Python API (coming from a gym background). I don't want to handle it with the amounts of dicts the Colab 02 Example did it(and a lot of coupling). And in the case decision_steps: 12 terminal_steps: 1 I don't know whether the Terminal Step happened before or after the Decision Step for the agent. Logically it makes only sense that the terminal step has happened before the decision step, but do I truly know this? Did I mention that in the previously mentioned case, I get 1 Terminal Step and 1 Decision Step for the same agent_id.

Maybe I have some thinking mistakes here, but I found it a lot harder to work with Unity ml_agents, than the general gym API. Even the API from your old 0.4.0 (with Brains) was way easier for me. But maybe I'm missing some part of this :)

Thank you so much for helping! I really appreciate this!

batu commented 3 years ago

I also would like to put in another usecase:

The variable length of len(termination_steps) + len(decision_steps) makes it difficult to work with learning frameworks that expect the same number of agents actions in each "step".

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 42 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

Unexpected getSteps Behavior with Python API #4992