Training regression between Ml-agents 0.8 and ML-agents 0.13.1

TouraisDavid commented 4 years ago

Describe the bug We are migrating from ML-Agents 0.8 to 0.13.1. We are using Unity 2018.4.15 for both. The project is a single unmanned aerial vehicle, controlled by one policy. Updates are done with fixedUpate() every 20ms. We have linked unity with XPlane flight simulator.

We replaced the brain by a Behavior Parameter script, updated the Assets/ML-Agents folder and added the barracuda package version 0.3.2-preview via Unity Package manager. The inference with an already trained policy is working well but I have a regression with the training performance. One Player loop is 5 ms long with ML_0.8 while it’s around 80 ms with ML_0.13.1. The unity profiler shows that with ML_0.13, we spent 73% of the loop time in the FixedUpdate.ScriptRunBehaviorFixedUpdate() function, more precisely the root.DecideAction() takes 50%. Have you ever faced such regression ?

To Reproduce Due to my custom environment, I don't have steps to reproduce.

Console logs / stack traces (ml-agents_0.8) >mlagents-learn XXX\trainer_config.yaml --run-id=UAV_ml_0_8 --train INFO:mlagents.trainers:{'--base-port': '5005', '--curriculum': 'None', '--debug': False, '--docker-target-name': 'None', '--env': 'None', '--help': False, '--keep-checkpoints': '5', '--lesson': '0', '--load': False, '--no-graphics': False, '--num-envs': '1', '--num-runs': '1', '--run-id': 'UAV_dto_ml_0_8', '--save-freq': '50000', '--seed': '-1', '--slow': False, '--train': True, '': 'XXX\trainer_config.yaml'} XXX\ml-agents_0.8\lib\site-packages\mlagents\trainers\learn.py:141: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. trainer_config = yaml.load(data_file) INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor. INFO:mlagents.envs: 'Academy_02' started successfully! Unity Academy name: Academy_02 Number of Brains: 2 Number of Training Brains : 1 Reset Parameters :

Unity brain name: StationnaireConcept_BrainLearning_02 Number of Visual Observations (per agent): 0 Vector Observation space size (per agent): 20 Number of stacked Vector Observation: 1 Vector Action space type: continuous Vector Action space size (per agent): [4] Vector Action descriptions: , , , Unity brain name: StationnaireConcept_BrainPlayer_02 Number of Visual Observations (per agent): 0 Vector Observation space size (per agent): 14 Number of stacked Vector Observation: 3 Vector Action space type: continuous Vector Action space size (per agent): [4] Vector Action descriptions: , , , 2020-02-19 15:49:04.781559: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 INFO:mlagents.envs:Hyperparameters for the PPO Trainer of brain StationnaireConcept_BrainLearning_02: batch_size: 1024 beta: 0.005 buffer_size: 10240 epsilon: 0.2 gamma: 0.99 hidden_units: 128 lambd: 0.95 learning_rate: 0.0003 max_steps: 5.0e3 normalize: False num_epoch: 3 num_layers: 2 time_horizon: 64 sequence_length: 64 summary_freq: 1000 use_recurrent: False summary_path: ./summaries/UAV_dto_ml_0_8-0_StationnaireConcept_BrainLearning_02 memory_size: 256 use_curiosity: False curiosity_strength: 0.01 curiosity_enc_size: 128 model_path: ./models/UAV_dto_ml_0_8-0/StationnaireConcept_BrainLearning_02 cmd.payload[0]=None cmd.payload[1]=True INFO:mlagents.trainers: UAV_dto_ml_0_8-0: StationnaireConcept_BrainLearning_02: Step: 1000. Time Elapsed: 9.862 s Mean Reward: 0.657. Std of Reward: 0.657. Training. INFO:mlagents.trainers: UAV_dto_ml_0_8-0: StationnaireConcept_BrainLearning_02: Step: 2000. Time Elapsed: 18.290 s Mean Reward: 0.295. Std of Reward: 0.000. Training. INFO:mlagents.trainers: UAV_dto_ml_0_8-0: StationnaireConcept_BrainLearning_02: Step: 3000. Time Elapsed: 26.793 s Mean Reward: 0.916. Std of Reward: 0.446. Training. INFO:mlagents.trainers: UAV_dto_ml_0_8-0: StationnaireConcept_BrainLearning_02: Step: 4000. Time Elapsed: 35.259 s Mean Reward: 0.324. Std of Reward: 0.289. Training. INFO:mlagents.trainers: UAV_dto_ml_0_8-0: StationnaireConcept_BrainLearning_02: Step: 5000. Time Elapsed: 43.688 s Mean Reward: 0.419. Std of Reward: 0.274. Training. INFO:mlagents.envs:Saved Model INFO:mlagents.trainers:List of nodes to export for brain :StationnaireConcept_BrainLearning_02 INFO:mlagents.trainers: is_continuous_control INFO:mlagents.trainers: version_number INFO:mlagents.trainers: memory_size INFO:mlagents.trainers: action_output_shape INFO:mlagents.trainers: action INFO:mlagents.trainers: action_probs INFO:mlagents.trainers: value_estimate INFO:tensorflow:Restoring parameters from ./models/UAV_dto_ml_0_8-0/StationnaireConcept_BrainLearning_02\model-5001.cptk INFO:tensorflow:Froze 17 variables. Converted 17 variables to const ops. Converting ./models/UAV_dto_ml_0_8-0/StationnaireConcept_BrainLearning_02/frozen_graph_def.pb to ./models/UAV_dto_ml_0_8-0/StationnaireConcept_BrainLearning_02.nn IGNORED: StopGradient unknown layer GLOBALS: 'is_continuous_control', 'version_number', 'memory_size', 'action_output_shape' IN: 'vector_observation': [-1, 1, 1, 20] => 'main_graph_0/hidden_0/BiasAdd' IN: 'vector_observation': [-1, 1, 1, 20] => 'main_graph_1/hidden_0/BiasAdd' IN: 'epsilon': [-1, 1, 1, 4] => 'mul' OUT: 'action', 'action_probs', 'value_estimate' DONE: wrote ./models/UAV_dto_ml_0_8-0/StationnaireConcept_BrainLearning_02.nn file. INFO:mlagents.trainers:Exported ./models/UAV_dto_ml_0_8-0/StationnaireConcept_BrainLearning_02.nn file

(ml-agents_0.13.1) >mlagents-learn XXX\trainer_config.yaml --run=UAV13 --train Version information: ml-agents: 0.13.1, ml-agents-envs: 0.13.1, Communicator API: API-13, TensorFlow: 1.7.1 INFO:mlagents.trainers:CommandLineOptions(debug=False, num_runs=1, seed=-1, env_path=None, run_id='UAV13', load_model=False, train_model=True, save_freq=50000, keep_checkpoints=5, base_port=5005, num_envs=1, curriculum_folder=None, lesson=0, no_graphics=False, multi_gpu=False, trainer_config_path='XXX\trainer_config.yaml', sampler_file_path=None, docker_target_name=None, env_args=None, cpu=False, width=84, height=84, quality_level=5, time_scale=20, target_frame_rate=-1) INFO:mlagents_envs:Listening on port 5004. Start training by pressing the Play button in the Unity Editor. INFO:mlagents_envs:Connected new brain: UAVBehavior?team=0 INFO:mlagents.trainers:Hyperparameters for the PPOTrainer of brain UAVBehavior: trainer: ppo batch_size: 1024 beta: 0.005 buffer_size: 10240 epsilon: 0.2 hidden_units: 128 lambd: 0.95 learning_rate: 0.0003 learning_rate_schedule: linear max_steps: 5.0e3 memory_size: 256 normalize: False num_epoch: 3 num_layers: 2 time_horizon: 64 sequence_length: 64 summary_freq: 1000 use_recurrent: False vis_encode_type: simple reward_signals: extrinsic: strength: 1.0 gamma: 0.99 summary_path: UAV13_UAVBehavior model_path: ./models/UAV13-0/UAVBehavior keep_checkpoints: 5 2020-03-02 13:56:00.650783: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 INFO:mlagents.trainers: UAV13: UAVBehavior: Step: 1000. Time Elapsed: 12.471 s Mean Reward: 0.837. Std of Reward: 0.647. Training. INFO:mlagents.trainers: UAV13: UAVBehavior: Step: 2000. Time Elapsed: 18.091 s Mean Reward: 1.862. Std of Reward: 0.000. Training. INFO:mlagents.trainers: UAV13: UAVBehavior: Step: 3000. Time Elapsed: 23.494 s Mean Reward: 0.395. Std of Reward: 0.188. Training. INFO:mlagents.trainers: UAV13: UAVBehavior: Step: 4000. Time Elapsed: 28.933 s Mean Reward: 2.482. Std of Reward: 0.000. Training. INFO:mlagents.trainers: UAV13: UAVBehavior: Step: 5000. Time Elapsed: 34.346 s Mean Reward: 0.118. Std of Reward: 0.000. Training. INFO:mlagents.trainers:Saved Model INFO:mlagents.trainers:List of nodes to export for brain :UAVBehavior?team=0 INFO:mlagents.trainers: is_continuous_control INFO:mlagents.trainers: version_number INFO:mlagents.trainers: memory_size INFO:mlagents.trainers: action_output_shape INFO:mlagents.trainers: action INFO:mlagents.trainers: action_probs Converted 11 variables to const ops. Converting ./models/UAV13-0/UAVBehavior/frozen_graph_def.pb to ./models/UAV13-0/UAVBehavior.nn IGNORED: StopGradient unknown layer GLOBALS: 'is_continuous_control', 'version_number', 'memory_size', 'action_output_shape' IN: 'vector_observation': [-1, 1, 1, 20] => 'main_graph_0/hidden_0/BiasAdd' IN: 'epsilon': [-1, 1, 1, 4] => 'mul' OUT: 'action', 'action_probs' DONE: wrote ./models/UAV13-0/UAVBehavior.nn file. INFO:mlagents.trainers:Exported ./models/UAV13-0/UAVBehavior.nn file

Environment (please complete the following information):

OS + version: Windows 10
ML-Agents version: v0.8 and v0.13.1
TensorFlow version: 1.7.1
Unity version: 2018.4.15
Conda version: 4.5.11 -Python version: 3.7.0

NOTE: We are unable to help reproduce bugs with custom environments. Please attempt to reproduce your issue with one of the example environments, or provide a minimal patch to one of the environments needed to reproduce the issue.

xiaomaogy commented 4 years ago

Hi @TouraisDavid, could you please give us your timers.json file in the summaries folder? It would help us debug and find out why this upgrade caused it to go slower.

TouraisDavid commented 4 years ago

ml-agents_0.13.1_log_2.txt uav_ml_013-0_timers.json.txt

Hi Vincent,

Please find attached the timer.json file for ml-agents 0.13.1 (there is no timer.json for ml-agents 0.8?) and the associated python console log.

Best regards,

David

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 42 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Unity-Technologies / ml-agents

Training regression between Ml-agents 0.8 and ML-agents 0.13.1 #3552