Closed wanghui589 closed 2 years ago
when i run the train_mujoco.sh, the error generate: NotImplementedError Traceback (most recent call last): File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 163, in main(sys.argv[1:]) File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 136, in main envs = make_train_env(all_args) File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 37, in make_train_env return ShareSubprocVecEnv([get_env_fn(i) for i in range(all_args.n_rollout_threads)]) File "/home/spaci/RL/TRPO-in-MARL-master/scripts/../envs/env_wrappers.py", line 360, in init self.n_agents = self.remotes[0].recv() File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 255, in recv buf = self._recv_bytes() File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes buf = self._recv(4) File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 384, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer
can anyone help me? thanks
I've met the same error, Maybe it caused by setting rollout thread or training thread too large, or your PC can not aford current setting of process number, or there exist many zombie process in computer system. you can restart you HPC or PC to retry it. Hope this suggestion can help you!
Thank you for your answer I have modified relevant parameters, but errors will still occur. Can you help me? the parameters is as: parser.add_argument("--n_training_threads", type=int, default=1, help="Number of torch threads for training") parser.add_argument("--n_rollout_threads", type=int, default=2, help="Number of parallel envs for training rollouts")
I'm learning your awesome work, but I'm having some trouble, can you help me? Actually, when i run the ShareSubprocVecEnv, the above error appeared. when i change to the 'ShareDummyVecEnv', then another error appeared: AttributeError: 'ShareDummyVecEnv' object has no attribute 'n_agents'.
Thank you for your answer I have modified relevant parameters, but errors will still occur. Can you help me? the parameters is as: parser.add_argument("--n_training_threads", type=int, default=1, help="Number of torch threads for training") parser.add_argument("--n_rollout_threads", type=int, default=2, help="Number of parallel envs for training rollouts")
You also need to modify the relevant parameters in train_mujoco.sh.
when i run the train_mujoco.sh, the error generate: NotImplementedError Traceback (most recent call last): File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 163, in
main(sys.argv[1:])
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 136, in main
envs = make_train_env(all_args)
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/train/train_mujoco.py", line 37, in make_train_env
return ShareSubprocVecEnv([get_env_fn(i) for i in range(all_args.n_rollout_threads)])
File "/home/spaci/RL/TRPO-in-MARL-master/scripts/../envs/env_wrappers.py", line 360, in init
self.n_agents = self.remotes[0].recv()
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 255, in recv
buf = self._recv_bytes()
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 419, in _recv_bytes
buf = self._recv(4)
File "/home/spaci/anaconda3/envs/env_name/lib/python3.9/multiprocessing/connection.py", line 384, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
can anyone help me? thanks