Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.12k stars 4.15k forks source link

Different training results when normalizing inputs #5454

Closed kkalera closed 2 years ago

kkalera commented 3 years ago

I've been testing some theories that I had and stumbled on something that I don't really understand and might be a bug. In de docs, it is stated that for the best performance, one should normalize the observations between -1 and 1 or 0 to 1. However, doing that is making the agent train slower. More information below;

I'll write more details about the environment, inputs, etc.. below, but here are the results: input-results

The environment:

Inputs:

Every training session was done using the same hyperparameters, the same run-seed in the same environment. The model was trained using only a single agent voor 500k steps.

Here's what I changed between runs:

note: All normalization was done using the same function that normalizes between 0 and 1. To get to the normalization between -10 and 10 for example; the inputs were normalized between 0 and 1, then multiplied by 20 and discounted from 10.

Package information:

kkalera commented 3 years ago

Update:

I've tried the same scenario using the ML-agents 18 release with python package 0.27.0 and am getting similar results.

This time using one of the provided environments, 3DBall. Below are the results:

3DBall results

While the normalized values between 0 and 1 don't have such a dramatic effect on learning as in my custom environment, there is a clear difference in learning speed.

note: All training runs were performed with the provided config file "config/ppo/3DBall.yaml". The only alteration was in the processing of the inputs.

Processing of the inputs looks like this:

    public override void CollectObservations(VectorSensor sensor)
    {
        if (useVecObs)
        {
            /*sensor.AddObservation(gameObject.transform.rotation.z);
            sensor.AddObservation(gameObject.transform.rotation.x);
            sensor.AddObservation(ball.transform.position - gameObject.transform.position);
            sensor.AddObservation(m_BallRb.velocity);*/

            /*sensor.AddObservation(Normalize(gameObject.transform.rotation.z, 0, 360));
            sensor.AddObservation(Normalize(gameObject.transform.rotation.x, 0, 360));
            sensor.AddObservation(NormalizeVector3(ball.transform.position - gameObject.transform.position, -3, 3));
            sensor.AddObservation(NormalizeVector3(m_BallRb.velocity, -10, 10));*/

            /*sensor.AddObservation(Normalize(gameObject.transform.rotation.z, 0, 360) * 10);
            sensor.AddObservation(Normalize(gameObject.transform.rotation.x, 0, 360) * 10);
            sensor.AddObservation(NormalizeVector3(ball.transform.position - gameObject.transform.position, -3, 3) * 10);
            sensor.AddObservation(NormalizeVector3(m_BallRb.velocity, -10, 10) * 10);*/

            /*sensor.AddObservation(1 - Normalize(gameObject.transform.rotation.z, 0, 360) * 2);
            sensor.AddObservation(1 - Normalize(gameObject.transform.rotation.x, 0, 360) * 2);
            sensor.AddObservation(Vector3.one - NormalizeVector3(ball.transform.position - gameObject.transform.position, -3, 3) * 2);
            sensor.AddObservation(Vector3.one - NormalizeVector3(m_BallRb.velocity, -10, 10) * 2);*/

            sensor.AddObservation(10 - Normalize(gameObject.transform.rotation.z, 0, 360) * 20);
            sensor.AddObservation(10 - Normalize(gameObject.transform.rotation.x, 0, 360) * 20);
            sensor.AddObservation(Vector3.one*10 - NormalizeVector3(ball.transform.position - gameObject.transform.position, -3, 3) * 20);
            sensor.AddObservation(Vector3.one*10 - NormalizeVector3(m_BallRb.velocity, -10, 10) * 20);

        }
    }

    public static float Normalize(float val, float min, float max)
    {
        return Mathf.Clamp(((val - min) / (max - min)), 0, 1);
    }

    public static Vector3 NormalizeVector3(Vector3 val, float min, float max)
    {
        val.x = Mathf.Clamp(((val.x - min) / (max - min)), 0, 1);
        val.y = Mathf.Clamp(((val.y - min) / (max - min)), 0, 1);
        val.z = Mathf.Clamp(((val.z - min) / (max - min)), 0, 1);
        return val;
    }
ervteng commented 3 years ago

This is interesting. I noticed that you clamp everything to 0 and 1. This usually will cause some issues with data being clipped away. For instance, in 3DBall the observations can be both positive and negative. If you "normalize" between 0 and 1, you still have negative values, which will be clipped away by the Mathf.Clamp function.

kkalera commented 3 years ago

Thanks for your reply @ervteng. I see what you're saying. I did normalize the positions using the negative positions available to the agent. I checked the maximum positions that are possible for the ball which were -3 and 3. Then normalized between these values. So -3 would result in 0 and 3 would result in 1.

Like in the code above: sensor.AddObservation(NormalizeVector3(ball.transform.position - gameObject.transform.position, -3, 3));

I'll redo my tests without clamping and will report back with the results.

kkalera commented 3 years ago

I did some retesting. The clamping did seem to have some effect on the performance. It's unclear to me why that's the case since I normalized using the negative values. That being said, there is still a performance loss when normalising the values.

Here are the results: new testing

To be honest, I have no clue what's causing this difference in performance and might be something inherit to the algorithm. My knowledge of the subject is currently not enough to speculate about why I'm getting these results.

Something that does seem interesting to me is that when normalised between -1 and 1, entropy dropped more significantly than the baseline, but did take about 15% longer to converge. Would this result in a more stable model because of lower entropy?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 42 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.