Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.17k stars 4.16k forks source link

Vector Observation Sizemismatch Error on Start #2804

Closed firepro20 closed 5 years ago

firepro20 commented 5 years ago

I have an issue only when starting my game environment. Vector Observation size has 6 less observations on start (first frame) [I am assuming CollectObservations is being called before OnTriggerEnter, where I populate my list to feed observation to the former method]

On each subsequent frame update, the size is correct and I am sending 14 observations. Until I fix this issue, the training will not be possible as an error in Anacoda is thrown

Traceback (most recent call last): File "c:\users\dzamm\anaconda3\envs\ml-agents\lib\multiprocessing\process.py", line 258, in _bootstrap self.run() File "c:\users\dzamm\anaconda3\envs\ml-agents\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) File "c:\users\dzamm\anaconda3\envs\ml-agents\lib\site-packages\mlagents\envs\subprocess_env_manager.py", line 116, in worker cmd.payload[0], cmd.payload[1], cmd.payload[2] File "c:\users\dzamm\anaconda3\envs\ml-agents\lib\site-packages\mlagents\envs\environment.py", line 352, in reset self._n_agents[_b] = len(s[_b].agents) KeyError: 'RLBrain'

This is how I collect observations -

public override void CollectObservations()
    {
        // Target and Agent positions
        AddVectorObs(target.position); // Vector Space 3 // issue as target set adter first frame fixed update should fix this?
        AddVectorObs(transform.position); // Vector Space 3

        // Spikes Position
        foreach(GameObject spike in spikesList) // 3 + 3
        {
            AddVectorObs(spike.transform.position); // only two spikes at a time.. closest spikes
        }

        // Agent velocity
        AddVectorObs(rBody.velocity.x); // Vector Space 1
        AddVectorObs(rBody.velocity.z); // Vector Space 1
    }

This is how I add to the Spikes list [ensuring that list has a static size of 2] -

private void OnTriggerEnter(Collider other)    {

        if (other.gameObject.tag == "Collectible" && !collectibleList.Contains(other.gameObject))
        {
            collectibleList.Add(other.gameObject);
            // Go to collectible here
            // if we detect something go to raycast?
            //transform.Rotate(0, GetRotation(GetNearest(collectibleList).transform.position), 0); // lerp for smoothness?
            // If something detection in ray get new direction always, but this should be handled in Update <-
            Debug.Log("Enter Trigger - Currently there is/are " + collectibleList.Count + " collectibles within reach");
        }
        if (other.gameObject.tag == "Hazard" && !spikesList.Contains(other.gameObject)) // can be improved use breakpoints
        {
            if (spikesList.Count < 2)
            {
                spikesList.Add(other.gameObject);
            }
            if (other.gameObject.tag == "Hazard" && !spikesList.Contains(other.gameObject) && spikesList.Count >= 2)
            {
                // add them to pool?
                // replace on current .. first replace 1, then replace 2, then replace 1 etc..
                spikesList.RemoveAt(0);
                spikesList.Insert(0, other.gameObject); // this works because of Linq
                indexMoved = true;
                return;
            }
            if (indexMoved)
            {
                spikesList.RemoveAt(1);
                spikesList.Insert(1, other.gameObject);
                indexMoved = false;
            }
        }}

The question is, how can I ensure that at the start I also get 14 Vector Observations as well instead of just 8 (since population is occurring after calling collectobservations?)?

surfnerd commented 5 years ago

Hi @firepro20, Indeed CollectObservations is being called before OnTriggerXXX as laid out by this event timeline of MonoBehaviour. CollectObservations is called from FixedUpdate in this diagram.

It seems like you are using a trigger to let your Agent know when it’s intersecting an objects collider on the first frame (and maybe subsequent frames.). You could instead let your agent have a reference to this first object so it can initialize itself on Awake. This way, when CollectObservations is called on the first frame, your Agent will be properly initialized. And you can then use your OnTriggerEnter function to update it as you already are.

Does this make sense?

firepro20 commented 5 years ago

This has worked. I am initializing at Awake

    private void Awake()
    {
        spikesList = new List<GameObject>();

        GameObject spikeOne = new GameObject();
        GameObject spikeTwo = new GameObject();
        // Initialisation before overwriting on each subsequent frame after first
        spikesList.Insert(0, spikeOne);
        spikesList.Insert(1, spikeTwo);
    } 

Now I have a new problem not sure if it's related. I need some help understanding the output as there are no warnings or errors in Unity, however training is ending 2 seconds in with the following Anaconda message -

INFO:mlagents.envs:Hyperparameters for the PPOTrainer of brain RLBrain:
        trainer:        ppo
        batch_size:     4096
        beta:   0.005
        buffer_size:    40960
        epsilon:        0.2
        hidden_units:   256
        lambd:  0.95
        learning_rate:  0.0001
        learning_rate_schedule: linear
        max_steps:      5.0e6
        memory_size:    512
        normalize:      False
        num_epoch:      8
        num_layers:     2
        time_horizon:   1024
        sequence_length:        64
        summary_freq:   1000
        use_recurrent:  False
        vis_encode_type:        simple
        reward_signals:
          extrinsic:
            strength:   1.0
            gamma:      0.99
        summary_path:   ./summaries/RLAgent-3_RLBrain
        model_path:     ./models/RLAgent-3-0/RLBrain
        keep_checkpoints:       5
Process Process-1:
Traceback (most recent call last):
  File "c:\users\dzamm\anaconda3\envs\ml-agents\lib\multiprocessing\process.py", line 258, in _bootstrap
    self.run()
  File "c:\users\dzamm\anaconda3\envs\ml-agents\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "c:\users\dzamm\anaconda3\envs\ml-agents\lib\site-packages\mlagents\envs\subprocess_env_manager.py", line 116, in worker
    cmd.payload[0], cmd.payload[1], cmd.payload[2]
  File "c:\users\dzamm\anaconda3\envs\ml-agents\lib\site-packages\mlagents\envs\environment.py", line 352, in reset
    self._n_agents[_b] = len(s[_b].agents)
KeyError: 'RLBrain'
INFO:mlagents.envs:Learning was interrupted. Please wait while the graph is generated.
INFO:mlagents.envs:Saved Model
INFO:mlagents.trainers:List of nodes to export for brain :RLBrain
INFO:mlagents.trainers: is_continuous_control
INFO:mlagents.trainers: version_number
INFO:mlagents.trainers: memory_size
INFO:mlagents.trainers: action_output_shape
INFO:mlagents.trainers: action
INFO:mlagents.trainers: action_probs
INFO:tensorflow:Froze 11 variables.
INFO:tensorflow:Froze 11 variables.
Converted 11 variables to const ops.
Converting ./models/RLAgent-3-0/RLBrain/frozen_graph_def.pb to ./models/RLAgent-3-0/RLBrain.nn
IGNORED: StopGradient unknown layer
GLOBALS: 'is_continuous_control', 'version_number', 'memory_size', 'action_output_shape'
IN: 'vector_observation': [-1, 1, 1, 14] => 'main_graph_0/hidden_0/BiasAdd'
IN: 'epsilon': [-1, 1, 1, 2] => 'mul'
OUT: 'action', 'action_probs'
DONE: wrote ./models/RLAgent-3-0/RLBrain.nn file.
INFO:mlagents.trainers:Exported ./models/RLAgent-3-0/RLBrain.nn file

Not sure if this appropriate place to post the above, but it seems to be related to previous error.

firepro20 commented 5 years ago

I think I know what the issue is, I am reloading the whole level when my agent dies. I will try to avoid this by setting it done when agent has no health to start the simulation over again without restarting/loading the whole level.

surfnerd commented 5 years ago

Just an FYI, using new to create GameObjects isn't technically supported. Although it's used as a workaround for you, I'd recommend either pulling a game object from the scene, or instantiating a prefab instead. You can read about it here: https://docs.unity3d.com/Manual/CreateDestroyObjects.html

firepro20 commented 5 years ago

I acknowledge this is not technically supported, however it temporarily solves my problem to fill the observation vector with temporary data at the very start. Feel free to close this issue, thanks for the help!

surfnerd commented 5 years ago

Closing per your last comment. Thanks for posting.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.