AI Training - dev vlog - Githubissues

YoungOnef commented 1 year ago

Different attempts at training.

I tried to give the ML agent the location of itself and the targets as inputs, but I didn’t know how to make it work. The ML agent needs to have consistent inputs, but when the targets were collected, they were deleted and the ML agent still had the old data. This made the ML agent go to random places with a very low success rate. So I chose 3D raycast as a sensor for the ML agent. This way, the inputs for the ML agent are consistent and updated.

YoungOnef commented 1 year ago

MoveToGoal1Object

[SerializeField] private float timePenalty = -0.01f;

OnActionReceived SetReward(timePenalty * (1 - Mathf.Abs(rotateAmount) - Mathf.Abs(moveAmount)));

OnTriggerEnter

        if (other.CompareTag("Goal"))
        {
            SetReward(+1.0f);
            targets.Remove(other.transform);
            Destroy(other.gameObject);

            // End episode if all targets are collected
            if (targets.Count == 0)
            {
                float timeRemaining = (MaxStep - StepCount) / (float)MaxStep;
                SetReward(1f + timeRemaining);
                EndEpisode();
            }
        }

        if (other.CompareTag("Wall"))
        {
            SetReward(-10f);
            EndEpisode();
        }
    }

Training YML file behaviors: MovetoGoal: trainer_type: ppo hyperparameters: batch_size: 10 buffer_size: 100 learning_rate: 3.0e-4 beta: 5.0e-4 epsilon: 0.2 lambd: 0.99 num_epoch: 3 learning_rate_schedule: linear beta_schedule: constant epsilon_schedule: linear network_settings: normalize: false hidden_units: 128 num_layers: 2 reward_signals: extrinsic: gamma: 0.99 strength: 1.0 max_steps: 500000 time_horizon: 64 summary_freq: 10000

YoungOnef commented 1 year ago

moveToGoalNewConfig1Goal

behaviors:
  MovetoGoal:
    trainer_type: ppo
    hyperparameters:
      batch_size: 10
      buffer_size: 100
      learning_rate: 1.0e-4 # Adjusted learning rate
      beta: 5.0e-4
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
      beta_schedule: constant
      epsilon_schedule: linear
    network_settings:
      normalize: true # Use normalized inputs
      hidden_units: 256 # Increased hidden units
      num_layers: 3 # Increased number of layers
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    max_steps: 1000000 # Increased max steps
    time_horizon: 128 # Increased time horizon
    summary_freq: 5000 # Increased summary frequency

YoungOnef commented 1 year ago

moveToGoalNewConfig1GoalNew

behaviors: MovetoGoal: trainer_type: ppo hyperparameters: batch_size: 128 buffer_size: 2048 learning_rate: 0.0003 beta: 0.005 epsilon: 0.2 lambd: 0.95 num_epoch: 3 learning_rate_schedule: linear network_settings: normalize: false hidden_units: 256 num_layers: 2 vis_encode_type: simple reward_signals: extrinsic: gamma: 0.99 strength: 1.0 curiosity: strength: 0.02 gamma: 0.99 encoding_size: 256 learning_rate: 3.0e-4 max_steps: 1000000 time_horizon: 128 summary_freq: 10000 threaded: true

YoungOnef commented 1 year ago

moveToGoalNewConfig1Goal2

behaviors: MovetoGoal: trainer_type: ppo hyperparameters: batch_size: 128 buffer_size: 2048 learning_rate: 1.0e-4 # Adjusted learning rate beta: 5.0e-4 epsilon: 0.2 lambd: 0.99 num_epoch: 3 learning_rate_schedule: linear beta_schedule: constant epsilon_schedule: linear network_settings: normalize: true # Use normalized inputs hidden_units: 256 # Increased hidden units num_layers: 3 # Increased number of layers reward_signals: extrinsic: gamma: 0.99 strength: 1.0 curiosity: strength: 0.02 gamma: 0.99 encoding_size: 256 learning_rate: 3.0e-4 max_steps: 1000000 # Increased max steps time_horizon: 128 # Increased time horizon summary_freq: 5000 # Increased summary frequency threaded: true

Modifed the SetReward(+0.1f), This way AI will be more willing to collect the coins and try to catch them all

        if (other.CompareTag("Goal"))
        {
            SetReward(+0.1f);
            targets.Remove(other.transform);
            Destroy(other.gameObject);

            // End episode if all targets are collected
            if (targets.Count == 0)
            {
                float timeRemaining = (MaxStep - StepCount) / (float)MaxStep;
                SetReward(1f + timeRemaining);
                EndEpisode();
            }
        }

YoungOnef commented 1 year ago

moveToGoalNewConfig1GoalGroupOf2 inherited from moveToGoalNewConfig1Goal2 Data and used the same config training file Two Objects to collect

YoungOnef commented 1 year ago

This line of the code wasn't calculating the Penalty correctly. It was still a postive number even thought it should be negative value SetReward(timePenalty * (1 - Mathf.Abs(rotateAmount) - Mathf.Abs(moveAmount)));

YoungOnef commented 1 year ago

Corrected the problem with adding better reward based on TimeRamining, when all of the targets are collected

        //If the agent collides with a target, remove the target from the list and add a reward
        if (other.CompareTag("Goal"))
        {
            SetReward(+0.1f);
            targets.Remove(other.transform);
            Destroy(other.gameObject);
            // End episode if all targets have been collected
            if (targets.Count == 0)
            {
                print("Targets collected");
                float timeRemaining = (MaxStep - StepCount) / (float)MaxStep;
                SetReward(1f + timeRemaining);
                EndEpisode();
            }
        }

CorrectMovetoGoal1Object\MovetoGoal

As the AI still has problem with collecting target collect near the wall, The training specify near the wall didn't help a lot. To help with fixing this type of problem the number of Raycast will be increased to 50, This way the AI will have better understating of whats going on around it.

Overall 50 Raycast will be kept as its show most performance than other solutions like camera pixel as for example it would beed more data passed to AI if the resolution is 32 x 32 which it would be 1024 data inputs than only 50

YoungOnef commented 1 year ago

The mouse AI is the very similar to Cat AI but the goal is to run away, which means that adjusting script is needed for mouse AI.

     Within OnEpisodeBegin so it will restart the timer for EndEpisode
      timer = 0f;

      within OnActionReceived function

        // Add reward for time survived
        timer += Time.deltaTime;
        AddReward(timeSurvivedReward * timer);

FleefromAgent\FleefromAgent

YoungOnef commented 1 year ago

Now Cat AI agent will be running against the mouse AI so it will learn new methods for capturing the mouse

Now training Mouse AI against Cat AI, so mouse AI will learn how not to get captured

YoungOnef commented 1 year ago

Cat AI start Full training against all 20 mice, so it will improve capturing all the mices

Overall Cat AI improved in the final scenario to capture all 20 mices and mices were running away from cat AI. This shows that ML agents are working correctly for this scenario and AIs are successfully created for this type of scenario

YoungOnef / Cat-and-Mice-Reinforcement-Learning

AI Training - dev vlog #1