Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
16.93k stars 4.14k forks source link

Different behaviour between frozen and live model #3780

Closed Alexander-Bakogeorge closed 3 years ago

Alexander-Bakogeorge commented 4 years ago

Describe the bug Hi, I'm trying to train 2 agents to balance a weight on a small planet. So fair my trainings have been successful, but I've noticed an odd behaviour. When I run inference through python (mlagents-learn, sans --train), my agents move more sporadically than their exported .nn file counterparts. I've attached two gifs below to illustrate (both are at 10 timescale, red balls are agents, white ball is the weight).

Screenshots .nn files gif1 Python gif2

Environment

andrewcoh commented 4 years ago

Hi @Alexander-Bakogeorge

Can you share your trainer configuration too?

Alexander-Bakogeorge commented 4 years ago

@andrewcoh

    trainer: ppo
    batch_size: 128
    beta: 5.0e-3
    buffer_size: 1024
    epsilon: 0.2
    hidden_units: 128
    lambd: 0.95
    learning_rate: 3.0e-4
    learning_rate_schedule: linear
    max_steps: 1.0e7
    memory_size: 256
    normalize: false
    num_epoch: 3
    num_layers: 2
    time_horizon: 64
    sequence_length: 64
    summary_freq: 10000
    use_recurrent: false
    vis_encode_type: simple
    reward_signals:
        extrinsic:
            strength: 1.0
            gamma: 0.99
        curiosity:
            strength: 0.02
            gamma: 0.99
            encoding_size: 256
andrewcoh commented 4 years ago

At what point in training did you record that python gif?

Can you describe the agent's objective and the reward function?

Alexander-Bakogeorge commented 4 years ago

Clips were recorded at ~9 million steps.

The goal is for it to be a cooperative task, where the agents need to work together to keep themselves and the weight above the equator. If any of the agents, or the weight, drop past the equator Done is called on the agents. They are then re-positioned to the top of the planet along with the weight and the next episode starts.

The agents get 0.1 reward per decision where they are still alive, with no penalties for environment resets or dying.

andrewcoh commented 4 years ago

Ok, I understand. Just to be sure, did you drag and drop the newly learned models into your agents? Is this the same behavior you see when you just press play in the editor without launching the python process?

Alexander-Bakogeorge commented 4 years ago

I did just drag and drop the newly learned models into my agent's behaviour parameters script post training. The 1st gif I shared is the behaviour I see when I press play in editor without the python process.

anupam-142857 commented 4 years ago

thanks for notifying about this issue @Alexander-Bakogeorge. I have added this to our bug tracker.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had activity in the last 14 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been automatically closed because it has not had activity in the last 28 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.