Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.19k stars 4.16k forks source link

Batcher.cs Exception #2391

Closed Peng2017 closed 5 years ago

Peng2017 commented 5 years ago

Describe the bug NullReferenceException: Object reference not set to an instance of an object MLAgents.Batcher.SendAcademyParameters (MLAgents.CommunicatorObjects.UnityRLInitializationOutput academyParameters) (at Assets/ML-Agents/Scripts/Batcher.cs:96) MLAgents.Academy.InitializeEnvironment () (at Assets/ML-Agents/Scripts/Academy.cs:345) MLAgents.Academy.Awake () (at Assets/ML-Agents/Scripts/Academy.cs:250)

To Reproduce Steps to reproduce the behavior:

  1. I use public override void InitializeAcademy() method in my academy
  2. when I click "run", I got the above Exception I walk through the Batcher.CS source code, and find there might be a code error: start from line 75: public CommunicatorObjects.UnityRLInitializationInput SendAcademyParameters( CommunicatorObjects.UnityRLInitializationOutput academyParameters) { CommunicatorObjects.UnityInput input; var initializationInput = new CommunicatorObjects.UnityInput(); try { initializationInput = m_communicator.Initialize( new CommunicatorObjects.UnityOutput { RlInitializationOutput = academyParameters }, out input); } catch { throw new UnityAgentsException( "The Communicator was unable to connect. Please make sure the External " + "process is ready to accept communication with Unity."); }

        var firstRlInput = input.RlInput;

here we can see 'input' is not an instance. I think that's why I got the Exception. Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

NOTE: We are unable to help reproduce bugs with custom environments. Please attempt to reproduce your issue with one of the example environments, or provide a minimal patch to one of the environments needed to reproduce the issue.

awjuliani commented 5 years ago

Hi @Peng2017

Can you give a little more context about under what situation you are using ML-Agents? Are you attempting to perform training, or inference? Have you walked through the getting started tutorial before?

awjuliani commented 5 years ago

No worries. Glad you were able to get things working for you.

Peng2017 commented 5 years ago

Sorry , bug is still there when I try to broadcast my brain

Peng2017 commented 5 years ago

Each time I Instantiate a GameObject Prefab with Agent.cs and Brain pre-combined with it, need I GiveBrain to the script again?

awjuliani commented 5 years ago

That is correct. You will either need to instantiate the agent already with a brain, or you will need to give the brain to the agent once it is instantiated.

Can I ask a little more about your use-case? Are you trying to collect data using Unity, and hence the broadcast feature?

Peng2017 commented 5 years ago

Bugs

awjuliani commented 5 years ago

Hi @Peng2017

You have control checked on the brain in the academy, this means that you cannot run from the editor without first launching the external python process to communicate with the brain and academy. Here is a document describing how to perform behavioral cloning, which I think might be relevant for you: https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-Behavioral-Cloning.md

Peng2017 commented 5 years ago

My Asset, a pretty simple scene as above Asset.zip

Peng2017 commented 5 years ago

(mlagents) D:\0Unity\ml-agents-0.9.0>mlagents-learn config\trainer_config.yaml --run-id=Test --train that's what I typed before run unity project. I forgot to get a screenshot.

Peng2017 commented 5 years ago

my python environment is correctly installed and has run ml agents training successfully several times before

Peng2017 commented 5 years ago

I figured out this:

  1. whatever v0.9 example files I opened, I cannot train with the same exception message above.
  2. I turned back to mlagents v0.8 and everything runs well it seems like a environment conflict
Peng2017 commented 5 years ago

I saw other people met the same issue as I described.

awjuliani commented 5 years ago

Do you get any errors on the python terminal side?

Peng2017 commented 5 years ago

It was my fault. I reinstalled anaconda-envs and updated mlagents up to date, now everything runs OK! Thank you for your help!

BenDerPan commented 5 years ago

@awjuliani I have the same problem, and my mlagents is v0.10.1

my unity side errors:

image image

my python terminal side errors:

(env-ml-agents) D:\Code\ML-Agents-Demo>mlagents-learn trainer_config.yaml --run-id=RollerBall-1 --train
2019-10-30 18:51:18.501696: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
2019-10-30 18:51:18.505223: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

                        ▄▄▄▓▓▓▓
                   ╓▓▓▓▓▓▓█▓▓▓▓▓
              ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
            ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
          ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
        ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
        ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
          ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
            '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
               ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
                   `▀█▓▓▓▓▓▓▓▓▓▌
                        ¬`▀▀▀█▓

INFO:mlagents.trainers:CommandLineOptions(debug=False, num_runs=1, seed=-1, env_path=None, run_id='RollerBall-1', load_model=False, train_model=True, save_freq=50000, keep_checkpoints=5, base_port=5005, num_envs=1, curriculum_folder=None, lesson=0, slow=False, no_graphics=False, multi_gpu=False, trainer_config_path='trainer_config.yaml', sampler_file_path=None, docker_target_name=None, env_args=None)
INFO:mlagents.envs:Start training by pressing the Play button in the Unity Editor.
INFO:mlagents.envs:
'Academy' started successfully!
Unity Academy name: Academy
        Number of Brains: 2
        Number of Training Brains : 1
        Reset Parameters :

Unity brain name: RollerBallBrain
        Number of Visual Observations (per agent): 0
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): [2]
        Vector Action descriptions: ,
Unity brain name: RollerBallPlayer
        Number of Visual Observations (per agent): 0
        Vector Observation space size (per agent): 8
        Number of stacked Vector Observation: 1
        Vector Action space type: continuous
        Vector Action space size (per agent): [2]
        Vector Action descriptions: ,
WARNING:tensorflow:From d:\github\ml-agents\ml-agents\mlagents\trainers\trainer.py:59: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

WARNING:tensorflow:From d:\github\ml-agents\ml-agents\mlagents\trainers\trainer.py:59: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

WARNING:tensorflow:From d:\github\ml-agents\ml-agents\mlagents\trainers\tf_policy.py:64: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From d:\github\ml-agents\ml-agents\mlagents\trainers\tf_policy.py:64: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From d:\github\ml-agents\ml-agents\mlagents\trainers\tf_policy.py:71: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From d:\github\ml-agents\ml-agents\mlagents\trainers\tf_policy.py:71: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2019-10-30 18:51:30.719320: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-10-30 18:51:30.725398: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-10-30 18:51:30.743335: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce 210 major: 1 minor: 2 memoryClockRate(GHz): 1.23
pciBusID: 0000:01:00.0
2019-10-30 18:51:30.750190: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_100.dll'; dlerror: cudart64_100.dll not found
2019-10-30 18:51:30.756172: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cublas64_100.dll'; dlerror: cublas64_100.dll not found
2019-10-30 18:51:30.761195: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cufft64_100.dll'; dlerror: cufft64_100.dll not found
2019-10-30 18:51:30.766731: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'curand64_100.dll'; dlerror: curand64_100.dll not found
2019-10-30 18:51:30.771141: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusolver64_100.dll'; dlerror: cusolver64_100.dll not found
2019-10-30 18:51:30.776150: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cusparse64_100.dll'; dlerror: cusparse64_100.dll not found
2019-10-30 18:51:30.781063: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudnn64_7.dll'; dlerror: cudnn64_7.dll not found
2019-10-30 18:51:30.784305: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1641] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2019-10-30 18:51:30.793744: F tensorflow/stream_executor/cuda/cuda_driver.cc:404] Check failed: CUDA_SUCCESS == cuDevicePrimaryCtxGetState(device, &former_primary_context_flags, &former_primary_context_is_active) (0 vs. 303)
Process Process-1:
Traceback (most recent call last):
  File "C:\Users\DDos\.conda\envs\env-ml-agents\lib\multiprocessing\connection.py", line 312, in _recv_bytes
    nread, err = ov.GetOverlappedResult(True)
BrokenPipeError: [WinError 109] 管道已结束。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\DDos\.conda\envs\env-ml-agents\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "C:\Users\DDos\.conda\envs\env-ml-agents\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "d:\github\ml-agents\ml-agents-envs\mlagents\envs\subprocess_env_manager.py", line 89, in worker
    cmd: EnvironmentCommand = parent_conn.recv()
  File "C:\Users\DDos\.conda\envs\env-ml-agents\lib\multiprocessing\connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "C:\Users\DDos\.conda\envs\env-ml-agents\lib\multiprocessing\connection.py", line 321, in _recv_bytes
    raise EOFError
EOFError
BenDerPan commented 5 years ago

@awjuliani fixed, the problem of tensorflow version. :)

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.