Can I train PPO for more than 10 Million steps?

Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

https://unity.com/products/machine-learning-agents

Other

17.12k stars 4.15k forks source link

Can I train PPO for more than 10 Million steps? #4193

Closed knmitri closed 2 years ago

knmitri commented 4 years ago

I am training a complicated model that needs lot of training steps to converge. For several runs (with different settings and environments), my training loss is suddenly increasing drastically after 10 million steps as shown in the picture. Before I did not have this issue, is there a limit in the ML agent release 3 for training steps?

Unity Version: [e.g. Unity 2019.3.6f1]
OS + version: [e.g. Windows 10]
ML-Agents version: (e.g. ML-Agents v3.0)
TensorFlow version: 2.2.0
Environment: custom environment

Shubhamai commented 4 years ago

Do you have enabled curiosity? What was the cumulative reward graph?

knmitri commented 4 years ago

Do you have enabled curiosity? What was the cumulative reward graph? @Shubhamai No I did not enable curiosity. the cumulative reward is also decreasing to large negative values.

Shubhamai commented 4 years ago

@knmitri I don't think there is a problem or bug somewhere, i would say that is normal

ZaiyunLin commented 4 years ago

I had the exact same issue on my model. I don't have this problem when using another pc with an older Mlagent version. With the exact same project and same setting and everything, the newer version mlagent will collapse after certain steps. This is definitely a bug.

niskander commented 4 years ago

@knmitri @zaiyun just curious what versions were you upgrading from (that did not have this issue)?

I've seen this happen with SAC (training is going well, no errors in training environment, then all of a sudden they stop playing), and this happened whenever I used negative rewards, which @zaiyun also seems to use.

However I don't have this issue with PPO, at least not in version 1.0.2. I can train up to 40M+ steps.

ZaiyunLin commented 4 years ago

I did not upgrade it from an older version. I directly installed the newest version. And on another pc, I installed the older version which I believe is release 2.

vincentpierre commented 3 years ago

Hi, Is there a way to reproduce this issue with one of the example environments on the latest version of ML-Agents? I am unable to reproduce this issue.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

hvpeteet commented 2 years ago

Closing since stale

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.