Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.18k stars 4.16k forks source link

Tensorboard Graphs Overwritten When Launching Sequential Training Sessions with RunOptions, run_training & run_cli #5188

Closed ohernpaul closed 2 years ago

ohernpaul commented 3 years ago

Sorry in advance for all the long logs. And thank you for any help/insights!

Describe the bug Important: I am running a custom script patched together with mlagents python functions. This script is being run in an IDE (spyder) and works without issues described bellow when the phases are launched from different kernels (ie kernel 1 finishes phase 1 -> kill kernel 1 -> start kernel 2 on phase 2 ==> no tensorboard graph issues).

A clear and concise description of what the bug is. I have been building out what I call a "PhaseLauncher" which is an automated way of launching training sessions that use different config files (yaml) in a sequential way. Primarily used so I don't have to manually launch runs with initialize-from.

The code works in general, but I am seeing some issues with how tensorboard is updating the graphs. As the first phase (no initialize-from) completes, the graph looks perfectly normal, but as the new phase's graph is updated (through summary freq), the old graph gets over written. See picture bellow.

I have done some debugging by stepping through the entire process and believe that the issue is within SubprocessEnvManager.

Bellow are some logs from the training session. The first glaring issue you can see is that in the first run the program establishes connection to a brain based on num-envs. The hyper params from runoptions are printed (once) and the training starts. When the training ends it first says connected to new brain (again), then shuts down the env. Reconnection before run_training or run_cli seems fishy.

Next: I create a new runoptions object from a new config file, define the initialize-from with the previous runid, then start training again. The issues I see here in the logs are: connection to unity env, connection to new brain, env shut down, connection to unity env, (run_cli) prints 2 duplicates of the hyper parameters from the phase2 config file. The number of duplicates is exactly the same as the number of phases. So phase 3 will print 3 duplicates of the hyper parameters.

In general it seems to work, but my fear is that the onnx models are being overwritten too. I have not tested this yet.

Console logs / stack traces Please wrap in triple backticks (```) to make it easier to read.

PHASE 1 Launch:

2021-03-25 14:10:38 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:10:38 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:10:39 INFO [environment.py:282] Connected new brain:
CarLearning?team=0
2021-03-25 14:10:39 INFO [environment.py:282] Connected new brain:
CarLearning?team=0
2021-03-25 14:10:39.498028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-03-25 14:10:42 INFO [stats.py:188] Hyperparameters for behavior name CarLearning: 
    trainer_type:   ppo
    hyperparameters:    
      batch_size:   1024
      buffer_size:  73728
      learning_rate:    0.0003
      beta: 0.01
      epsilon:  0.2
      lambd:    0.95
      num_epoch:    3
      learning_rate_schedule:   constant
    network_settings:   
      normalize:    False
      hidden_units: 512
      num_layers:   1
      vis_encode_type:  simple
      memory:   None
    reward_signals: 
      extrinsic:    
        gamma:  0.99
        strength:   1.0
        network_settings:   
          normalize:    False
          hidden_units: 128
          num_layers:   2
          vis_encode_type:  simple
          memory:   None
    init_path:  None
    keep_checkpoints:   5
    checkpoint_interval:    500000
    max_steps:  50000
    time_horizon:   512
    summary_freq:   5000
    threaded:   True
    self_play:  None
    behavioral_cloning: None

PHASE 1 Termination:

2021-03-25 14:11:15 INFO [model_serialization.py:183] Converting to D:\wkspaces\AIsland\ml-agents-master\results\debug_tests\no_gail_phase_1\CarLearning\CarLearning-53317.onnx
2021-03-25 14:11:15 INFO [model_serialization.py:195] Exported D:\wkspaces\AIsland\ml-agents-master\results\debug_tests\no_gail_phase_1\CarLearning\CarLearning-53317.onnx
2021-03-25 14:11:15 INFO [torch_model_saver.py:116] Copied D:\wkspaces\AIsland\ml-agents-master\results\debug_tests\no_gail_phase_1\CarLearning\CarLearning-53317.onnx to D:\wkspaces\AIsland\ml-agents-master\results\debug_tests\no_gail_phase_1\CarLearning.onnx.
2021-03-25 14:11:15 INFO [trainer_controller.py:81] Saved Model
2021-03-25 14:10:38 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:10:38 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:10:39 INFO [environment.py:282] Connected new brain:
CarLearning?team=0
2021-03-25 14:10:39 INFO [environment.py:282] Connected new brain:
CarLearning?team=0
2021-03-25 14:10:39.498028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-03-25 14:11:22 INFO [environment.py:429] Environment shut down with return code 0 (CTRL_C_EVENT).
2021-03-25 14:11:22 INFO [environment.py:429] Environment shut down with return code 0 (CTRL_C_EVENT).

PHASE 2 Launch:

2021-03-25 14:10:38 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:10:38 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:10:39 INFO [environment.py:282] Connected new brain:
CarLearning?team=0
2021-03-25 14:10:39 INFO [environment.py:282] Connected new brain:
CarLearning?team=0
2021-03-25 14:10:39.498028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-03-25 14:11:22 INFO [environment.py:429] Environment shut down with return code 0 (CTRL_C_EVENT).
2021-03-25 14:11:22 INFO [environment.py:429] Environment shut down with return code 0 (CTRL_C_EVENT).
2021-03-25 14:55:03 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:55:03 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:55:04 INFO [stats.py:188] Hyperparameters for behavior name CarLearning: 
    trainer_type:   ppo
    hyperparameters:    
      batch_size:   1024
      buffer_size:  73728
      learning_rate:    0.0003
      beta: 0.01
      epsilon:  0.2
      lambd:    0.95
      num_epoch:    3
      learning_rate_schedule:   constant
    network_settings:   
      normalize:    False
      hidden_units: 512
      num_layers:   1
      vis_encode_type:  simple
      memory:   None
    reward_signals: 
      extrinsic:    
        gamma:  0.99
        strength:   1.0
        network_settings:   
          normalize:    False
          hidden_units: 128
          num_layers:   2
          vis_encode_type:  simple
          memory:   None
    init_path:  D:\wkspaces\AIsland\ml-agents-master\results\debug_tests\no_gail_phase_1\CarLearning
    keep_checkpoints:   5
    checkpoint_interval:    500000
    max_steps:  50000
    time_horizon:   512
    summary_freq:   5000
    threaded:   True
    self_play:  None
    behavioral_cloning: None
2021-03-25 14:55:04 INFO [stats.py:188] Hyperparameters for behavior name CarLearning: 
    trainer_type:   ppo
    hyperparameters:    
      batch_size:   1024
      buffer_size:  73728
      learning_rate:    0.0003
      beta: 0.01
      epsilon:  0.2
      lambd:    0.95
      num_epoch:    3
      learning_rate_schedule:   constant
    network_settings:   
      normalize:    False
      hidden_units: 512
      num_layers:   1
      vis_encode_type:  simple
      memory:   None
    reward_signals: 
      extrinsic:    
        gamma:  0.99
        strength:   1.0
        network_settings:   
          normalize:    False
          hidden_units: 128
          num_layers:   2
          vis_encode_type:  simple
          memory:   None
    init_path:  D:\wkspaces\AIsland\ml-agents-master\results\debug_tests\no_gail_phase_1\CarLearning
    keep_checkpoints:   5
    checkpoint_interval:    500000
    max_steps:  50000
    time_horizon:   512
    summary_freq:   5000
    threaded:   True
    self_play:  None
    behavioral_cloning: None
2021-03-25 14:55:04 INFO [torch_model_saver.py:98] Starting training from step 0 and saving to D:\wkspaces\AIsland\ml-agents-master\results\debug_tests\no_gail_phase_2\CarLearning.

2021-03-25 14:10:38 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:10:38 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:10:39 INFO [environment.py:282] Connected new brain:
CarLearning?team=0
2021-03-25 14:10:39 INFO [environment.py:282] Connected new brain:
CarLearning?team=0
2021-03-25 14:10:39.498028: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-03-25 14:11:22 INFO [environment.py:429] Environment shut down with return code 0 (CTRL_C_EVENT).
2021-03-25 14:11:22 INFO [environment.py:429] Environment shut down with return code 0 (CTRL_C_EVENT).
2021-03-25 14:55:03 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:55:03 INFO [environment.py:113] Connected to Unity environment with package version 1.9.0-preview and communication version 1.5.0
2021-03-25 14:55:04 INFO [environment.py:282] Connected new brain:
CarLearning?team=0
2021-03-25 14:55:04 INFO [environment.py:282] Connected new brain:
CarLearning?team=0
2021-03-25 14:55:09 INFO [stats.py:180] CarLearning. Step: 5000. Time Elapsed: 2689.398 s. Mean Reward: -1.050. Std of Reward: 0.000.

Screenshots If applicable, add screenshots to help explain your problem. image

Environment (please complete the following information):

NOTE: We are unable to help reproduce bugs with custom environments. Please attempt to reproduce your issue with one of the example environments, or provide a minimal patch to one of the environments needed to reproduce the issue.

surfnerd commented 3 years ago

Hi @ohernpaul, This is a lot to take in so let me make sure I understand you correctly before moving forward. From what I've read, you are launching sequential training steps in one process without shutting down the previous training process. You are doing this by creating RunOption instances with updated information you want to use for your next phase of training.

Does that sound accurate to you?

ohernpaul commented 3 years ago

Hi @ohernpaul, This is a lot to take in so let me make sure I understand you correctly before moving forward. From what I've read, you are launching sequential training steps in one process without shutting down the previous training process. You are doing this by creating RunOption instances with updated information you want to use for your next phase of training.

Does that sound accurate to you?

Chris, thanks for the quick response. Yes, sequential training sessions where each session uses initialize-from to start the new phase with knowledge from previous phase. I hacked together this automation pipeline using the RunOptions and run_training suggestions from my previous post about hyper parameters. My guess is that the session termination is not clearing correctly.

Outline:

I can provide more info if needed!

ohernpaul commented 3 years ago

I've been watching training progress from phase to phase and it seems to be learning correctly and that the problem is probably related only to tensorboard (statswriter?)

Update: My temporary fix is copying the previous phase result directory to a new location before the next phase starts training and overwritting the new result values.

surfnerd commented 3 years ago

Cool, thanks for clarifying. I'm talking about this with the team. We will get back to you soon.

ohernpaul commented 3 years ago

I reproduced the issue using 3DBall if this will help.

I also stepped through with a debugger and found that the env_manager.close() connects to a unity environment and brain before closing.

  1. env_manager.close()
  2. self.conn.send(EnvironmentRequest(EnvironmentCommand.CLOSE))
  3. PipeConnection._send_bytes(self,buf)
  4. ov, err = _winapi.WriteFile(...) <-- creates unity env and connects to brain before closing?
class PhaseLauncher():
    def __init__(self):

        ##########################################################
        self.wkspace_dir = 'path\\to\\wkspace\\'
        self.mlagents_dir = self.wkspace_dir + 'ml-agents-master\\'
        self.config_dir = self.mlagents_dir + 'config\\'
        self.builds_dir = self.mlagents_dir + 'builds\\'
        self.results_dir = self.mlagents_dir + 'results\\reproduce_test\\'

        self.results_dir_cpy = self.mlagents_dir + 'results\\reproduce_test_copy\\'
        ##########################################################
        self.phase = 0
        self.run_id = '3dball_ppo'

        self.quality = 1
        self.height = 300
        self.width = 300
        self.no_graphics = False
        self.use_init_from = False
        self.use_env = True
        self.nb_envs = 5
        self.do_inference = False
        self.init_from = ''
        self.loop_counter = 0
        self.seed = 0

        self.phase_config_dir = self.config_dir + 'debug_phases\\'

        self.runs_array = []

        ##########################################################

    def Start(self):

        #loop to represent phases
        for i in range(2):
            #call run_training and pass the run_options object
            print("---Phase Start---")
            if self.loop_counter == 0:
                self.seed = 1337
                self.run_id = self.run_id + '_' + str(self.seed)
            else:
                self.seed = 101
                self.run_id = ''.join([self.run_id.split('_')[0], '_' + str(self.seed)])

            self.GetRunOptions(self.phase_config_dir + '3DBall.yaml')

    ##########################################################
    def GetRunOptions(self, phase_config_path):
        """
        RUN OPTIONS SECTION

        (most pulled from settings.py in mlagents/trainers)
        """
        #==================================================
        print("---Building Run Options---")

        #Define Config Dict 
        configured_dict: Dict[str, Any] = {
            "checkpoint_settings": {},
            "env_settings": {},
            "engine_settings": {},
            "torch_settings": {},
        }

        #fill dict with params defined in yaml file
        configured_dict.update(load_config(phase_config_path))
        #==================================================

        #Fill what would be CLI args with values defined in script
        configured_dict["checkpoint_settings"]['run_id'] = self.run_id
        configured_dict["checkpoint_settings"]['results_dir'] = self.results_dir
        configured_dict["checkpoint_settings"]['force'] = True
        configured_dict["checkpoint_settings"]['inference'] = self.do_inference

        configured_dict["engine_settings"]['width'] = self.width
        configured_dict["engine_settings"]['height'] = self.height
        configured_dict["engine_settings"]['quality_level'] = self.quality
        configured_dict["engine_settings"]['no_graphics'] = self.no_graphics

        configured_dict["env_settings"]['env_path'] = self.builds_dir + '3DBall'
        configured_dict["env_settings"]['num_envs'] = self.nb_envs
        #==================================================

        final_runoptions = RunOptions.from_dict(configured_dict)

        self.RunTraining(final_runoptions)

    ##########################################################
    def RunTraining(self, run_options):   
        #run_training(self.seed, run_options)
        run_cli(run_options)

        shutil.copytree(self.results_dir + self.run_id, self.results_dir_cpy + self.run_id)

        time.sleep(2)

    ##########################################################

if __name__ == "__main__":
    pl = PhaseLauncher()
    pl.Start()
surfnerd commented 3 years ago

Hey @ohernpaul, We have logged this issue internally as MLA-812 and will update this thread once work is complete on it. Thank you for your very detailed feedback!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 28 days. It will be closed in the next 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 42 days. If this issue is still valid, please ping a maintainer. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.