Closed Alexander-Bakogeorge closed 3 years ago
Hi @Alexander-Bakogeorge
Can you share your trainer configuration too?
@andrewcoh
trainer: ppo
batch_size: 128
beta: 5.0e-3
buffer_size: 1024
epsilon: 0.2
hidden_units: 128
lambd: 0.95
learning_rate: 3.0e-4
learning_rate_schedule: linear
max_steps: 1.0e7
memory_size: 256
normalize: false
num_epoch: 3
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 10000
use_recurrent: false
vis_encode_type: simple
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
curiosity:
strength: 0.02
gamma: 0.99
encoding_size: 256
At what point in training did you record that python gif?
Can you describe the agent's objective and the reward function?
Clips were recorded at ~9 million steps.
The goal is for it to be a cooperative task, where the agents need to work together to keep themselves and the weight above the equator. If any of the agents, or the weight, drop past the equator Done is called on the agents. They are then re-positioned to the top of the planet along with the weight and the next episode starts.
The agents get 0.1 reward per decision where they are still alive, with no penalties for environment resets or dying.
Ok, I understand. Just to be sure, did you drag and drop the newly learned models into your agents? Is this the same behavior you see when you just press play in the editor without launching the python process?
I did just drag and drop the newly learned models into my agent's behaviour parameters script post training. The 1st gif I shared is the behaviour I see when I press play in editor without the python process.
thanks for notifying about this issue @Alexander-Bakogeorge. I have added this to our bug tracker.
This issue has been automatically marked as stale because it has not had activity in the last 14 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 28 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Describe the bug Hi, I'm trying to train 2 agents to balance a weight on a small planet. So fair my trainings have been successful, but I've noticed an odd behaviour. When I run inference through python (mlagents-learn, sans --train), my agents move more sporadically than their exported .nn file counterparts. I've attached two gifs below to illustrate (both are at 10 timescale, red balls are agents, white ball is the weight).
Screenshots .nn files Python
Environment