Open Pimool opened 11 months ago
Hi @Pimool, Check the ml-agents version (the environment given in this repo was built for release_1). Also, it seems your training exited with a critical error suggested by the SIGABRT error code.
Hi, @dhyeythumar I used ml-agents release 20(the recent one). However, it worked in colab and Saturn cloud with your environment on release 20.. So, I don't think it's a problem with release.. I don't know why the SIGABRT error occurs only on my personal GPU server.
Then most probably the environment is exiting with an error, it's possible that the Linux executable is not supported on GPU. If I remember correctly on colab this env works on the CPU instance itself haven't tried it on GPU (try this and see if the GPU instance on colab works or not).
It works in colab on T4 GPU. Also, Saturn Cloud was on GPU, too. The below is the colab notebook. https://colab.research.google.com/drive/1sFY_V-uirL9pCPBlHkme8zBMfp3e1cJQ?usp=sharing
Below is the Player-0.log file when I try start training with the code above. I have no idea about the errors, and why the handler cannot load such files. Any help is greatly appreciated
''' Mono path[0] = '/home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Managed' Mono config path = '/home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/MonoBleedingEdge/etc' Preloaded 'lib_burst_generated.so' Preloaded 'libgrpc_csharp_ext.x64.so' Initialize engine version: 2019.3.15f1 (59ff3e03856d) [Subsystems] Discovering subsystems at path /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/UnitySubsystems Forcing GfxDevice: Null GfxDevice: creating device client; threaded=0 NullGfxDevice: Version: NULL 1.0 [1.0] Renderer: Null Device Vendor: Unity Technologies Begin MonoManager ReloadAssembly Completed reload, in 0.142 seconds WARNING: Shader Unsupported: 'Autodesk Interactive' - All passes removed WARNING: Shader Did you use #pragma only_renderers and omit this platform? UnloadTime: 1.141076 ms Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libcoreclr.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libdl.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib.so Fallback handler could not load library /home/desktop/ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball_Data/Mono/libSystem.dylib Caught fatal signal - signo:11 code:1 errno:0 addr:0x561114101530 Obtained 4 stack frames. 0 0x007f8bd7a1a520 in __sigaction 1 0x007f8bd66696b5 in grpc_completion_queue_create_internal(grpc_cq_completion_type, grpc_cq_polling_type) 2 0x007f8bd666abf0 in grpc_completion_queue_create_for_next 3 0x000000405aa870 in (wrapper managed-to-native) object:wrapper_native_0x7f8bd6657df0 () '''
Hi @Pimool ,
try this command !mlagents-learn config.yaml --run-id=$run_id --env=$env_name --no-graphics
I guess on your server it's trying to render the environment.
Hi, @dhyeythumar
Thanks for your advice, but It raises same error.
Yeah, we have the same problem and we couldnt find any solution
Hi, I am trying to run Reinforcement Learning on a GPU runbox.
With your code, I could train the model on Colab, and Saturn Cloud which is similar to colab.
However, when I tried to run on my personal GPU runbox, it occured an error.
mlagents-learn -h showed the options, so I thought it is a problem with environment.
How can I handle this error?
~$ mlagents-learn config.yaml --run-id=test --env=ML-Agents-with-Google-Colab/headless_build/3DBall_example/3dball.x86_64
Version information: ml-agents: 0.31.0.dev0, ml-agents-envs: 0.31.0.dev0, Communicator API: 1.5.0, PyTorch: 1.11.0+cu102 [INFO] Learning was interrupted. Please wait while the graph is generated. Traceback (most recent call last): File "/home/desktop/venv/bin/mlagents-learn", line 33, in
sys.exit(load_entry_point('mlagents', 'console_scripts', 'mlagents-learn')())
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 264, in main
run_cli(parse_command_line())
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 260, in run_cli
run_training(run_seed, options, num_areas)
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/learn.py", line 136, in run_training
tc.start_learning(env_manager)
File "/home/desktop/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(*args, *kwargs)
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 197, in start_learning
raise ex
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 172, in start_learning
self._reset_env(env_manager)
File "/home/desktop/ml-agents/ml-agents-envs/mlagents_envs/timers.py", line 305, in wrapped
return func(args, **kwargs)
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/trainer_controller.py", line 105, in _reset_env
env_manager.reset(config=new_config)
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/env_manager.py", line 68, in reset
self.first_step_infos = self._reset_env(config)
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 446, in _reset_env
ew.previous_step = EnvironmentStep(ew.recv().payload, ew.worker_id, {}, {})
File "/home/desktop/ml-agents/ml-agents/mlagents/trainers/subprocess_env_manager.py", line 101, in recv
raise env_exception
mlagents_envs.exception.UnityEnvironmentException: Environment shut down with return code -6 (SIGABRT).