Unity-Technologies / ml-agents

The Unity Machine Learning Agents Toolkit (ML-Agents) is an open-source project that enables games and simulations to serve as environments for training intelligent agents using deep reinforcement learning and imitation learning.
https://unity.com/products/machine-learning-agents
Other
17.19k stars 4.16k forks source link

Receiving Unknown Communication Error between Python #3919

Closed XavierGeerinck closed 4 years ago

XavierGeerinck commented 4 years ago

Hi,

While building and trying to run ML Agents under Linux, I always run into a "Unknown communication error between Python": Unknown communication error between Python. Python communication protocol: 1.0.0, Python library version: 0.16.0.

This is happening on the latest release (Release 1).

When digging a bit deeper it seems to be on Line 150 (see link below) in the RpcCommunicator. Which is thrown when initializationInput is not null and the input is null.

I think this might have to do due the fact that I am running this in a container but would like this verified. Is there anyone that can help me with this?

Note: This is appearing when running on Windows or on Windows WSL 2 (both don't work)

Code Reference: https://github.com/Unity-Technologies/ml-agents/blob/751232a087f8dd8f836f2914571b182a4f6d59d1/com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs#L150

surfnerd commented 4 years ago

Hi @thebillkidy, Do you happen to have the python terminal logs you can share? Off the top of my head I can’t think of what it might be.

XavierGeerinck commented 4 years ago

@surfnerd: Thanks for your swift reply, here are the terminal logs I see:

xavier@<MASKED>:/mnt/f/project-reinforcement-learning/src/Servers/ML-Agents$ python3 test.py
Found path: /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall.x86_64
Mono path[0] = '/mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Managed'
Mono config path = '/mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/MonoBleedingEdge/etc'
Preloaded 'libgrpc_csharp_ext.x64.so'
Initialize engine version: 2019.3.11f1 (ceef2d848e70)
[Subsystems] Discovering subsystems at path /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/UnitySubsystems
Forcing GfxDevice: Null
GfxDevice: creating device client; threaded=0
NullGfxDevice:
    Version:  NULL 1.0 [1.0]
    Renderer: Null Device
    Vendor:   Unity Technologies
Begin MonoManager ReloadAssembly
- Completed reload, in  3.616 seconds
WARNING: Shader Unsupported: 'Autodesk Interactive' - All passes removed
WARNING: Shader Did you use #pragma only_renderers and omit this platform?
UnloadTime: 5.149800 ms
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib
Unknown communication error between Python. Python communication protocol: 1.0.0, Python library version: 0.17.0.dev0.
(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

Couldn't connect to trainer on port 5005 using API version 1.0.0. Will perform inference instead.
(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)
graybob commented 4 years ago

Hi , this happened to me when I use Master branch , use tag release_1 instead and give a try. git clone --branch release_1 https://github.com/Unity-Technologies/ml-agents.git

Thanks

XavierGeerinck commented 4 years ago

I am indeed utilizing the download from here: https://github.com/Unity-Technologies/ml-agents/archive/release_1.zip :)

surfnerd commented 4 years ago

@thebillkidy are you running both python and unity in the container?

XavierGeerinck commented 4 years ago

This happens both when:

surfnerd commented 4 years ago

Hi @thebillkidy, I noticed this in your python log Python library version: 0.17.0.dev0. This tells me you are working off of master and not off of release_1 in your python codebase. Though, the communication should still work.

surfnerd commented 4 years ago

What's the code in your main.py? Do you override the base port?

XavierGeerinck commented 4 years ago

😱 Omg, thanks a lot ! I will try re-downloading and retrying right away!!! Very strange though since I'm using the download from release_1... Maybe it's somewhere cached.

I do not override the main port BTW, just 2 lines with an import and the Unity Environment start pointing to the compiled scene.

Get Outlook for Androidhttps://aka.ms/ghei36


From: Chris Goy notifications@github.com Sent: Monday, May 4, 2020 8:04:55 PM To: Unity-Technologies/ml-agents ml-agents@noreply.github.com Cc: xavier geerinck thebillkidy@hotmail.com; Mention mention@noreply.github.com Subject: Re: [Unity-Technologies/ml-agents] Receiving Unknown Communication Error between Python (#3919)

Hi @thebillkidyhttps://github.com/thebillkidy, I noticed this in your python log Python library version: 0.17.0.dev0. This tells me you are working off of master and not off of release_1 in your python codebase.

β€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Unity-Technologies/ml-agents/issues/3919#issuecomment-623617219, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEQLMRYBUQPM7EAI5KTBS3RP37UPANCNFSM4MYC3UFQ.

XavierGeerinck commented 4 years ago

Hi @surfnerd, just checked it and still the same (Python Library 0.16.0 now :)

logs:

Found path: /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall.x86_64
Mono path[0] = '/mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Managed'
Mono config path = '/mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/MonoBleedingEdge/etc'
Preloaded 'libgrpc_csharp_ext.x64.so'
Initialize engine version: 2019.3.11f1 (ceef2d848e70)
[Subsystems] Discovering subsystems at path /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/UnitySubsystems
Forcing GfxDevice: Null
GfxDevice: creating device client; threaded=0
NullGfxDevice:
    Version:  NULL 1.0 [1.0]
    Renderer: Null Device
    Vendor:   Unity Technologies
Begin MonoManager ReloadAssembly
- Completed reload, in  2.466 seconds
WARNING: Shader Unsupported: 'Autodesk Interactive' - All passes removed
WARNING: Shader Did you use #pragma only_renderers and omit this platform?
UnloadTime: 4.327100 ms
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libcoreclr.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib.so
Fallback handler could not load library /mnt/f/project-reinforcement-learning/src/Servers/ML-Agents/./envs/3DBall_Data/Mono/libSystem.dylib
Unknown communication error between Python. Python communication protocol: 1.0.0, Python library version: 0.16.0.
(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

Couldn't connect to trainer on port 5005 using API version 1.0.0. Will perform inference instead.
(Filename: ./Runtime/Export/Debug/Debug.bindings.h Line: 35)

Note: I start with python3 test.py and the following content:

from mlagents_envs.environment import UnityEnvironment
unity_env = UnityEnvironment(f"./envs/3DBall.x86_64", seed=1, side_channels=[])
surfnerd commented 4 years ago

I know this is asking a lot, but i'm going to ask if you could print the exception in the C# code where we print out the "Will perform inference instead." message.

This should be the place where you could catch an actual exception and print it: https://github.com/Unity-Technologies/ml-agents/blob/92163c8031a90891962baa12089e35187b8093b6/com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs#L162

So you'd need to add (Exception e) to the catch portion then, Debug.Log(e.message) or whatever you see fit.

XavierGeerinck commented 4 years ago

So I added 2 exception handlers, one in the RpcCommunicator and one in the Academy (where it logged the message for "Will perform inference instead". This is what I got (wrapped it with DEBUG ON and DEBUG OFF):

RpcCommunicator.cs

DEBUG ON
UnityEngine.DebugLogHandler:Internal_Log(LogType, LogOption, String, Object)
UnityEngine.DebugLogHandler:LogFormat(LogType, Object, String, Object[])
UnityEngine.Logger:Log(LogType, Object)
UnityEngine.Debug:LogWarning(Object)
Unity.MLAgents.RpcCommunicator:Initialize(CommunicatorInitParameters) (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Communicator\RpcCommunicator.cs:162)
Unity.MLAgents.Academy:InitializeEnvironment() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:378)
Unity.MLAgents.Academy:LazyInitialize() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:218)
Unity.MLAgents.Academy:.ctor() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:206)
Unity.MLAgents.<>c:<.cctor>b__80_0() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:78)
System.Lazy`1:CreateValue()
System.Lazy`1:LazyInitValue()
System.Lazy`1:get_Value()
Unity.MLAgents.Academy:get_Instance() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:93)
Unity.MLAgentsExamples.ProjectSettingsOverrides:Awake() (at F:\ml-agents\ml-agents-release_1\Project\Assets\ML-Agents\Examples\SharedAssets\Scripts\ProjectSettingsOverrides.cs:52)

(Filename: F Line: 0)

ICommunicator.Initialize() failed.
UnityEngine.DebugLogHandler:Internal_Log(LogType, LogOption, String, Object)
UnityEngine.DebugLogHandler:LogFormat(LogType, Object, String, Object[])
UnityEngine.Logger:Log(LogType, Object)
UnityEngine.Debug:LogWarning(Object)
Unity.MLAgents.RpcCommunicator:Initialize(CommunicatorInitParameters) (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Communicator\RpcCommunicator.cs:163)
Unity.MLAgents.Academy:InitializeEnvironment() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:378)
Unity.MLAgents.Academy:LazyInitialize() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:218)
Unity.MLAgents.Academy:.ctor() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:206)
Unity.MLAgents.<>c:<.cctor>b__80_0() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:78)
System.Lazy`1:CreateValue()
System.Lazy`1:LazyInitValue()
System.Lazy`1:get_Value()
Unity.MLAgents.Academy:get_Instance() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:93)
Unity.MLAgentsExamples.ProjectSettingsOverrides:Awake() (at F:\ml-agents\ml-agents-release_1\Project\Assets\ML-Agents\Examples\SharedAssets\Scripts\ProjectSettingsOverrides.cs:52)

(Filename: F Line: 0)

DEBUG OFF

Academy.cs

DEBUG ON
UnityEngine.DebugLogHandler:Internal_Log(LogType, LogOption, String, Object)
UnityEngine.DebugLogHandler:LogFormat(LogType, Object, String, Object[])
UnityEngine.Logger:Log(LogType, Object)
UnityEngine.Debug:LogWarning(Object)
Unity.MLAgents.Academy:InitializeEnvironment() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:399)
Unity.MLAgents.Academy:LazyInitialize() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:218)
Unity.MLAgents.Academy:.ctor() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:206)
Unity.MLAgents.<>c:<.cctor>b__80_0() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:78)
System.Lazy`1:CreateValue()
System.Lazy`1:LazyInitValue()
System.Lazy`1:get_Value()
Unity.MLAgents.Academy:get_Instance() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:93)
Unity.MLAgentsExamples.ProjectSettingsOverrides:Awake() (at F:\ml-agents\ml-agents-release_1\Project\Assets\ML-Agents\Examples\SharedAssets\Scripts\ProjectSettingsOverrides.cs:52)

(Filename: F Line: 0)

The Communicator was unable to connect. Please make sure the External process is ready to accept communication with Unity.
UnityEngine.DebugLogHandler:Internal_Log(LogType, LogOption, String, Object)
UnityEngine.DebugLogHandler:LogFormat(LogType, Object, String, Object[])
UnityEngine.Logger:Log(LogType, Object)
UnityEngine.Debug:LogWarning(Object)
Unity.MLAgents.Academy:InitializeEnvironment() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:400)
Unity.MLAgents.Academy:LazyInitialize() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:218)
Unity.MLAgents.Academy:.ctor() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:206)
Unity.MLAgents.<>c:<.cctor>b__80_0() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:78)
System.Lazy`1:CreateValue()
System.Lazy`1:LazyInitValue()
System.Lazy`1:get_Value()
Unity.MLAgents.Academy:get_Instance() (at F:\ml-agents\ml-agents-release_1\com.unity.ml-agents\Runtime\Academy.cs:93)
Unity.MLAgentsExamples.ProjectSettingsOverrides:Awake() (at F:\ml-agents\ml-agents-release_1\Project\Assets\ML-Agents\Examples\SharedAssets\Scripts\ProjectSettingsOverrides.cs:52)

(Filename: F Line: 0)

DEBUG OFF
XavierGeerinck commented 4 years ago

Update: going to RpcCommunicator.cs on m_Client.Exchange() it is showing an NULL value for unityInput and when running m_Client.Exchange(WrapMessage(null, 200)) separately I get: Status(StatusCode=Unavailable, Detail="Connect Failed")

XavierGeerinck commented 4 years ago

After an offline discussion (Thanks a lot for helping so far @surfnerd !!) Current progress:

Python Port Check: OK Python GRPC Server Start: OK

Currently the gRPC server seems to start and then stops almost instantly (after netstat -tulpn). Question I have there is that if it's due to an error or due to not receiving commands. I remember that when I created a gRPC server myself I had to have a while loop since server.start() is not blocking.

AFAIK when utilizing gRPC I utilized a Servicer that I add to the server, then at the bottom of the main server.py file I always added a try while true, time.sleep loop.

Will continue tomorrow and update here

XavierGeerinck commented 4 years ago

The - stupid - issue has been found! This happens when no .reset() call is presented to the environment. Or in more details, when having these lines:

from mlagents_envs.environment import UnityEnvironment
unity_env = UnityEnvironment(f"./envs/3DBall.x86_64", seed=1, side_channels=[])

The environment is non blocking, so the gRPC communicator server automatically closes since it thinks it's "done" (which it actually is of course...)

Transforming it into the following solves this issue and I see an observation coming in!

from mlagents_envs.environment import UnityEnvironment
unity_env = UnityEnvironment(f"./envs/3DBall.x86_64", seed=1, side_channels=[])

env_info = unity_env.reset()
print(env_info)

Observation:

rl_output {
  agentInfos {
    key: "3DBall?team=0"
    value {
      value {
        observations {
          shape: 8
          float_data {
            data: -0.04766776040196419
            data: -0.08700116723775864
            data: -0.5429515838623047
            data: 4.0
            data: 0.11863136291503906
            data: 0.0
            data: 0.0
            data: 0.0
          }
        }
      }
      value {
        id: 1
        observations {
          shape: 8
          float_data {
            data: -0.04967791959643364
            data: -0.016459552571177483
            data: -1.2431511878967285
            data: 4.0
            data: 1.087937355041504
            data: 0.0
            data: 0.0
            data: 0.0
          }
        }
      }
      # ... trimmed
    }
  }
}
rl_initialization_output {
  brain_parameters {
    vector_action_size: 2
    vector_action_space_type: continuous
    brain_name: "3DBall?team=0"
    is_training: true
  }
}

So for me the main question is: is this as intended, or should a feature be added that alerts the user that a server is opened but that no commands have been received yet? 😊 I would classify this as a "nice to have" but it makes things more clearer πŸ˜€

surfnerd commented 4 years ago

Hey @thebillkidy, Yes calling reset is the way to get the environment state. I will update the doc to add these lines of code to make that more clear. Thank you for your diligence. If you feel like this issue is resolved, please close it and have fun with ml-agents!

XavierGeerinck commented 4 years ago

Awesome, thanks a lot! :) will close it now

github-actions[bot] commented 3 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.