edomedo00 / sporing

Little interactive experience where AI driven minions help you through a forest to find the secret of their existence.
2 stars 0 forks source link

Agent to push the ball into the hole #11

Closed edomedo00 closed 1 month ago

JuanGdev commented 2 months ago

Due to the complexity of the final training, I moved the task to the To Do section to start with simpler tasks.

JuanGdev commented 1 month ago

Adding some useful resources for this Issue:

I'm currently focused on GAIL and Behavioural Cloning with extrinsic rewards. Looking forward to use intrinsic rewards for the final agent functionality

JuanGdev commented 1 month ago

PS. I u want to use a pre trained model, you need to include the init_path value in the yaml configuration file. Note that you need to use the same configuration settings for the new model you want to train.

JuanGdev commented 1 month ago

Adding the configuration file I'm using:

behaviors:
  Bolas:
    trainer_type: ppo
    hyperparameters:
      batch_size: 512
      buffer_size: 4096
      learning_rate: 0.0003
      beta: 0.05
      epsilon: 0.1
      lambd: 0.95
      num_epoch: 5
      learning_rate_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 512
      num_layers: 3
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
      gail:
        gamma: 0.99
        strength: 0.01
        network_settings:
          normalize: false
          hidden_units: 128
          num_layers: 2
          vis_encode_type: simple
        learning_rate: 0.0003
        use_actions: false
        use_vail: false
        demo_path: C:\Users\juana\Documents\Verano2024\spooringTrainingMLAgents\Assets\Demonstrations\bolasDemoRandomW.demo 
    keep_checkpoints: 5
    max_steps: 1000000
    time_horizon: 256
    summary_freq: 50000
    behavioral_cloning:
      demo_path: C:\Users\juana\Documents\Verano2024\spooringTrainingMLAgents\Assets\Demonstrations\bolasDemoRandomW.demo
      steps: 50000
      strength: 1.0
      samples_per_update: 0