Unable to retrace demonstrations from python

Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.

https://unity.com/products/machine-learning-agents

Other

17.18k stars 4.16k forks source link

Unable to retrace demonstrations from python #3216

Closed m-rph closed 4 years ago

m-rph commented 4 years ago

Describe the bug I am loading demonstrations from python using the from mlagents.trainers.demo_loader. If I try to retrace the trajectory, i.e. reset and select same action as what is in demo, the agent fails to follow the demonstration.

To Reproduce Steps to reproduce the behavior:

Record a demonstration
Replay Demonstration from python

Environment (please complete the following information):

MacOS 10.15.2
ML-Agents version: 0.10.0
Environment: Obstacle Tower

NOTE: We are unable to help reproduce bugs with custom environments. Please attempt to reproduce your issue with one of the example environments, or provide a minimal patch to one of the environments needed to reproduce the issue.

xiaomaogy commented 4 years ago

What do you mean by "if I try to retrace the trajectory"? Could you provide more detailed steps?

m-rph commented 4 years ago

I can give you code actually

def replay(path, **other_params):
    brain_params, brain_infos, _ = demo_loader.load_demonstration(str(path))
    #some initialization and setting up
    env.reset()
     #starting from 1 because it has the previous_action
    for binfo in brain_infos[1:]:
        #process_info extracts the vector of the previous_action
        _,_,_, info = process_info(binfo)
        _,_,_, newinfo = env.step(info.previous_action)

My goal is to be able to take in a demonstration and take the same steps as those in the demo in order to return to the same location as the final step in it (the demo)

xiaomaogy commented 4 years ago

In this case you are assuming the environment will always be fixed. In that case if you take the exact same action for every steps, you will be able to retrace the demonstration.

However the obstacle tower environment is not a fixed environment. So you won't be able to retrace the demo unless you fix the seed that varies the generation of the environment(and maybe other things that might vary, for example physics in unity).

m-rph commented 4 years ago

Yes I have it fixed to the same seed as the one from the demo. Hence I posted this as a bug.

xiaomaogy commented 4 years ago

Maybe there are something else you need to keep fixed.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had activity in the last 14 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because it has not had activity in the last 28 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.