Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.17k stars 4.16k forks source link

Agent training is working fine but exported model dose not working in Unity #5203

Closed shohanulka closed 3 years ago

shohanulka commented 3 years ago

Describe the bug I have training vehicle agent to follow a simple track. I am using behavioural cloning when I train the model I see it works fine but after that when I export it in Unity the agent just stuck. On the other hand the model size is very small only 95kb

here is my trainer config

default:
    trainer: ppo
    batch_size: 1024
    beta: 5.0e-3
    buffer_size: 10240
    epsilon: 0.2
    hidden_units: 128
    lambd: 0.99
    learning_rate: 3.0e-4
    max_steps: 5000000
    memory_size: 256
    normalize: false
    num_epoch: 3
    num_layers: 2
    time_horizon: 64
    sequence_length: 64
    summary_freq: 10000
    use_recurrent: false
    reward_signals:
        extrinsic:
            strength: 1.0
            gamma: 0.99

RaceAgent:
    summary_freq: 10000
    time_horizon: 64
    batch_size: 256
    buffer_size: 2048
    hidden_units: 128
    num_layers: 2
    beta: 5.0e-4
    learning_rate_schedule: linear
    max_steps: 5.0e7
    num_epoch: 3
    behavioral_cloning:
        demo_path: RaceAgentN4_1.demo
        strength: 1.0
        steps: 150000
    reward_signals:
        extrinsic:
            strength: 0.1
            gamma: 0.99
        curiosity:
            strength: 0.01
            gamma: 0.90
            encoding_size: 256
        gail:
            strength: 1.0
            gamma: 0.99
            encoding_size: 128
            demo_path: RaceAgentN4_1.demo 
using System;
using System.Linq;
using UnityEngine;
using Random = UnityEngine.Random;
using Unity.MLAgents;
using Unity.MLAgents.Sensors;

// todo change sensor to child object
// todo avoid obstacle
// collect food
// hit other player
// lap counter

public class BikeAgent : Agent
{
    [SerializeField] private Transform m_SpawnPos;
    [SerializeField] private Vehicle _vehicle; // ref
    [SerializeField] private TrackCheckpoints _trackCheckpoints; // ref
    [SerializeField] private Transform m_BikeSphere;

    public override void Initialize()
    {
        base.Initialize();
        _trackCheckpoints.OnPlayerCorrectCheckpoint += OnCorrectCheckPoint;
        _trackCheckpoints.OnPlayerWrongCheckpoint -= OnWrongCheckPoint;
        _vehicle.StopVehicle = false;
    }

    //reward 
    void OnCorrectCheckPoint(Transform carTransform, bool isLapComplete)
    {
        //bike sphere 
        if(carTransform == this.m_BikeSphere)
        {
            AddReward(1f);
            if (isLapComplete) AddReward(1f);
            // print("Reward");
        }

    }

    //punish 
    void OnWrongCheckPoint(Transform carTransform)
    {
        if (carTransform == this.m_BikeSphere)
        {
            AddReward(-1f);
        }
    }

    public override void OnEpisodeBegin()
    {
        base.OnEpisodeBegin();

        //reset vehicle
        ResetVehicle();
    }

    void ResetVehicle()
    {
        _vehicle.StopVehicle = true;
        Vector3 spwnPos = m_SpawnPos.position + new Vector3(x: Random.Range(-3f, 3f), 0.75f, Random.Range(-2f, 2f));
        transform.position = spwnPos;
        m_BikeSphere.position = spwnPos;
        transform.forward = m_SpawnPos.forward;
        m_BikeSphere.forward = m_SpawnPos.forward;

        _trackCheckpoints.ResetCheckPoint(m_BikeSphere);       
        //todo reset checkpoint 
    }

    //collect observation
    public override void CollectObservations(VectorSensor sensor)
    {
        base.CollectObservations(sensor);

        Vector3 checkPointForward = _trackCheckpoints.GetNextCheckPoint(this.m_BikeSphere).transform.forward;
        float dirDot = Vector3.Dot(this.transform.forward, checkPointForward);
        sensor.AddObservation(dirDot);

        //m_SpawnPos.transform.position = _trackCheckpoints.GetPreviousCheckPoint(this.m_BikeSphere).position;
        //print(dirDot);
    }

    //action received 
    public override void OnActionReceived(float[] vectorAction)
    {
        base.OnActionReceived(vectorAction);

        //get off from the track 
        if (transform.position.y < 0f)
        {
            AddReward(-1f);
            EndEpisode();
        }
        else
        {
            _vehicle.StopVehicle = false;
        }

        float forwardAmount = 0f;
        float turnAmount = 0f;

        forwardAmount = Mathf.FloorToInt(vectorAction[0]);
        turnAmount = Mathf.FloorToInt(vectorAction[1]);

        switch (forwardAmount)
        {
            case 0:
                //idle
                break;
            case 1:
                //forward
                _vehicle.ControlAccelerate();
                break;
            case 2:
                //backward
                _vehicle.ControlBrake();
                break;
        }

        switch (turnAmount)
        {
            case 0:
                //idle
                break;
            case 1:
                //left
                _vehicle.ControlSteer(-1);
                break;
            case 2:
                //right
                _vehicle.ControlSteer(1);
                break;
        }

        AddReward(-1f / MaxStep);
    }

    public override void Heuristic(float[] actionsOut)
    {
        base.Heuristic(actionsOut);

        //default idle
        actionsOut[0] = 0; // forward 
        actionsOut[1] = 0; // turn 

        //acclerate 
        if (Input.GetKey(KeyCode.W)) actionsOut[0] = 1;
        // break 
        if (Input.GetKey(KeyCode.S)) actionsOut[0] = 2;

        // turn left  
        if (Input.GetKey(KeyCode.A)) actionsOut[1] = 1;
        // turn right 
        if (Input.GetKey(KeyCode.D)) actionsOut[1] = 2;  

    }

    //todo collision obstacle reward etc

    private void OnCollisionEnter(Collision other)
    {
        if (other.gameObject.CompareTag("wall"))
        {
            AddReward(-0.05f);
        }
    }

    private void OnCollisionStay(Collision other)
    {
        if (other.gameObject.CompareTag("wall"))
        {
            AddReward(-0.01f);
        }
    }

    // Update is called once per frame
    void Update()
    {
        //todo update UI
    }
}

Environment (please complete the following information):

dongruoping commented 3 years ago

Hi, a few thing I want to clarify:

  1. Just to double check, did you drag and drop the trained models into your agents? Does that imported successfully without any errors (the field "model" showed the model name you assigned and no warnings or errors showed up)?
  2. Did you modify anything between running training and inference? Like timescale or any other configurations?
  3. By saying "agent just stuck", is your agent doing something and just not acting the same way as in training, or is it not working at all?
shohanulka commented 3 years ago

Hi, a few thing I want to clarify:

  1. Just to double check, did you drag and drop the trained models into your agents? Does that imported successfully without any errors (the field "model" showed the model name you assigned and no warnings or errors showed up)?
  2. Did you modify anything between running training and inference? Like timescale or any other configurations?
  3. By saying "agent just stuck", is your agent doing something and just not acting the same way as in training, or is it not working at all?
  1. yes I have drag and drop the model without any error
  2. no, my training command was mlagents-learn race_config --run-id="test1" --time-scale=3 in unity timescale was 1 default
  3. it is not acting the same as training, in training I can see it is acting as expected the mean reward is getting higher but in game agent just stoped where it started, the thing I have noticed is that after 1 million steps the model size is only 93kb the size should increase right ? I guess the agent does not getting data from the model.
dongruoping commented 3 years ago

There's no barracuda inference issue that we're aware of now. So my guess is one possible cause is that if your game has physics that's dependent on time scale you might see issues running training and inference at different time scale. Have you tried running training and inference using the same time scale?

shohanulka commented 3 years ago

There's no barracuda inference issue that we're aware of now. So my guess is one possible cause is that if your game has physics that's dependent on time scale you might see issues running training and inference at different time scale. Have you tried running training and inference using the same time scale?

you have point, let me try that I will run both on same time-scale. but why the model size is so small, after so many steps ? is it fine ?

dongruoping commented 3 years ago

Sure, please update if you're able to get it running or if we need to dig in further.

Also model size doesn't change with steps, it's affected by your network configurations like hidden_units, num_layers, memory_size, etc. Not sure how small it is but if you can import it in the editor successfully then it should be fine.

shohanulka commented 3 years ago

thank you for your help, I have solved the issue by running them same time-scale 💯 @dongruoping.

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.