dhyeythumar / ML-Agents-with-Google-Colab

Train reinforcement learning agent using ML-Agents with Google Colab.
https://dhyeythumar.medium.com/training-ml-agents-with-google-colab-cb166c3dca46
MIT License
43 stars 7 forks source link

mlagents_envs.exception.UnityEnvironmentException: Environment shut down with return code -6 (SIGABRT). #3

Open Pimool opened 11 months ago

Pimool commented 11 months ago

Hi, I am trying to run Reinforcement Learning on a GPU runbox.

With your code, I could train the model on Colab, and Saturn Cloud which is similar to colab.

However, when I tried to run on my personal GPU runbox, it occured an error.

mlagents-learn -h showed the options, so I thought it is a problem with environment.

How can I handle this error?

~$ mlagents-learn config.yaml --run-id=test --env=ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball.x86_64

Version information: ml-agents: 0.31.0.dev0, ml-agents-envs: 0.31.0.dev0, Communicator API: 1.5.0, PyTorch: 1.11.0+cu102 [INFO] Learning was interrupted. Please wait while the graph is generated. Traceback (most recent call last): File "/home/desktop/venv/bin/mlagents-learn", line 33, in sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')()) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 264, in main run_cli(parse_command_line()) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 260, in run_cli run_training(run_seed, options, num_areas) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 136, in run_training tc.start_learning(env_manager) File "/home/desktop/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped return func(*args, *kwargs) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 197, in start_learning raise ex File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 172, in start_learning self._reset_env(env_manager) File "/home/desktop/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped return func(args, **kwargs) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 105, in _reset_env env_manager.reset(config=new_config) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/env_manager.py", line 68, in reset self.first_step_infos = self._reset_env(config) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 446, in _reset_env ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {}) File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 101, in recv raise env_exception mlagents_envs.exception.UnityEnvironmentException: Environment shut down with return code -6 (SIGABRT).

dhyeythumar commented 11 months ago

Hi @Pimool, Check the ml-agents version (the environment given in this repo was built for release_1). Also, it seems your training exited with a critical error suggested by the SIGABRT error code.

Pimool commented 11 months ago

Hi, @dhyeythumar I used ml-agents release 20(the recent one). However, it worked in colab and Saturn cloud with your environment on release 20.. So, I don't think it's a problem with release.. I don't know why the SIGABRT error occurs only on my personal GPU server.

dhyeythumar commented 11 months ago

Then most probably the environment is exiting with an error, it's possible that the Linux executable is not supported on GPU. If I remember correctly on colab this env works on the CPU instance itself haven't tried it on GPU (try this and see if the GPU instance on colab works or not).

Pimool commented 11 months ago

It works in colab on T4 GPU. Also, Saturn Cloud was on GPU, too. The below is the colab notebook. https://colab.research.google.com/drive/1sFY_V-uirL9pCPBlHkme8zBMfp3e1cJQ?usp=sharing

Pimool commented 11 months ago

Below is the Player-0.log file when I try start training with the code above. I have no idea about the errors, and why the handler cannot load such files. Any help is greatly appreciated

''' Mono path[0] = '/home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Managed' Mono config path = '/home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/MonoBleedingEdge/etc' Preloaded 'lib_burst_generated.so' Preloaded 'libgrpc_csharp_ext.x64.so' Initialize engine version: 2019.3.15f1 (59ff3e03856d) [Subsystems] Discovering subsystems at path /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/UnitySubsystems Forcing GfxDevice: Null GfxDevice: creating device client; threaded=0 NullGfxDevice: Version: NULL 1.0 [1.0] Renderer: Null Device Vendor: Unity Technologies Begin MonoManager ReloadAssembly Completed reload, in 0.142 seconds WARNING: Shader Unsupported: 'Autodesk Interactive' - All passes removed WARNING: Shader Did you use #pragma only_renderers and omit this platform? UnloadTime: 1.141076 ms Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Caught fatal signal - signo:11 code:1 errno:0 addr:0x561114101530 Obtained 4 stack frames. 0 0x007f8bd7a1a520 in __sigaction 1 0x007f8bd66696b5 in grpc_completion_queue_create_internal(grpc_cq_completion_type, grpc_cq_polling_type) 2 0x007f8bd666abf0 in grpc_completion_queue_create_for_next 3 0x000000405aa870 in (wrapper managed-to-native) object:wrapper_native_0x7f8bd6657df0 () '''

dhyeythumar commented 10 months ago

Hi @Pimool , try this command !mlagents-learn config.yaml --run-id=$run_id --env=$env_name --no-graphics

I guess on your server it's trying to render the environment.

Pimool commented 10 months ago

Hi, @dhyeythumar

Thanks for your advice, but It raises same error.

OmarVector commented 10 months ago

Yeah, we have the same problem and we couldnt find any solution