allenai / embodied-clip

Official codebase for EmbCLIP
https://arxiv.org/abs/2111.09888
Apache License 2.0
111 stars 11 forks source link

[I need help] EOFError #8

Closed SmartAndCleverRobot closed 2 years ago

SmartAndCleverRobot commented 2 years ago

I train the RoboTHOR ObjectNav use DDPPO baselines normally in my Ubuntu 20.04 server. PYTHONPATH=. python allenact/main.py -o storage/objectnav-robothor-rgb-clip-rn50 -b projects/objectnav_baselines/experiments/robothor/clip objectnav_robothor_rgb_clipresnet50gru_ddppo

When I configured the environment and started training according to the instructions, an EOFError error was reported. I searched for a long time and could not find the reason. Can the author give me some help? thank you very much

[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 INFO:] Worker 10 closing.       [vector_sampled_tasks.py: 377]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 INFO:] Worker 8 closing.        [vector_sampled_tasks.py: 377]
Process ForkServerProcess-1:11:
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 INFO:] Worker 9 closing.        [vector_sampled_tasks.py: 377]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 INFO:] Worker 6 closing.        [vector_sampled_tasks.py: 377]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 INFO:] Worker 14 closing.       [vector_sampled_tasks.py: 377]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 INFO:] Worker 11 closing.       [vector_sampled_tasks.py: 377]
Process ForkServerProcess-1:9:
Traceback (most recent call last):
[07/28 15:48:26 INFO:] Worker 2 closing.        [vector_sampled_tasks.py: 377]
[07/28 15:48:26 INFO:] Worker 5 closing.        [vector_sampled_tasks.py: 377]
Process ForkServerProcess-1:10:
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
Process ForkServerProcess-1:7:
[07/28 15:48:26 INFO:] Worker 1 closing.        [vector_sampled_tasks.py: 377]
[07/28 15:48:26 INFO:] Worker 4 closing.        [vector_sampled_tasks.py: 377]
[07/28 15:48:26 INFO:] Worker 3 closing.        [vector_sampled_tasks.py: 377]
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
Process ForkServerProcess-1:15:
[07/28 15:48:26 INFO:] Worker 13 closing.       [vector_sampled_tasks.py: 377]
Process ForkServerProcess-1:12:
Traceback (most recent call last):
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
Process ForkServerProcess-1:3:
[07/28 15:48:26 INFO:] Worker 7 closing.        [vector_sampled_tasks.py: 377]
Process ForkServerProcess-1:6:
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
Process ForkServerProcess-1:2:
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
[07/28 15:48:26 INFO:] Worker 0 closing.        [vector_sampled_tasks.py: 377]
Process ForkServerProcess-1:4:
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
Process ForkServerProcess-1:5:
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
Traceback (most recent call last):
Process ForkServerProcess-1:14:
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
Traceback (most recent call last):
Process ForkServerProcess-1:8:
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
Traceback (most recent call last):
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
Traceback (most recent call last):
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
Process ForkServerProcess-1:1:
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
Traceback (most recent call last):
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
Traceback (most recent call last):
EOFError
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
Traceback (most recent call last):
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
EOFError
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
EOFError
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
EOFError
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
EOFError
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
EOFError
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
        [vector_sampled_tasks.py: 371]
[07/28 15:48:26 INFO:] Worker 12 closing.       [vector_sampled_tasks.py: 377]
Process ForkServerProcess-1:13:
Traceback (most recent call last):
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 372, in _task_sampling_loop_worker
    raise e
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/vector_sampled_tasks.py", line 313, in _task_sampling_loop_worker
    read_input = connection_read_fn()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 255, in recv
    buf = self._recv_bytes()
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
  File "/home/lixinting/anaconda3/envs/embclip-allenact/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
EOFError
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 ERROR:] [train worker 1] Encountered RuntimeError, exiting.     [engine.py: 1595]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/engine.py", line 1587, in train
    self.run_pipeline()
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/engine.py", line 1433, in run_pipeline
    num_done = int(self.num_workers_done.get("done"))
RuntimeError: Broken pipe
        [engine.py: 1598]
Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/engine.py", line 1587, in train
    self.run_pipeline()
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/engine.py", line 1433, in run_pipeline
    num_done = int(self.num_workers_done.get("done"))
RuntimeError: Broken pipe
[07/28 15:48:26 INFO:] [train worker 1] Closing OnPolicyRLEngine.vector_tasks.  [engine.py: 665]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 ERROR:] Encountered Exception. Terminating runner.      [runner.py: 1251]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 ERROR:] Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/runner.py", line 1218, in log_and_close
    raise Exception(
Exception: Train worker 1 abnormally terminated
        [runner.py: 1252]
Traceback (most recent call last):
  File "/home/lixinting/code/embclip-allenact/allenact/algorithms/onpolicy_sync/runner.py", line 1218, in log_and_close
    raise Exception(
Exception: Train worker 1 abnormally terminated
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
[07/28 15:48:26 INFO:] SingleProcessVectorSampledTask 0 closing.        [vector_sampled_tasks.py: 1021]
SmartAndCleverRobot commented 2 years ago

I found that the error occurs when the program runs to line 1433 of

embclip-zeroshot/allenact/algorithms/onpolicy_sync/engine.py
num_done = int(self.num_workers_done.get("done"))

self.num_workers_done is defined as follows

self.num_workers_done = torch.distributed.PrefixStore(  # type:ignore
        "num_workers_done", self.store
 )

What could be causing the above error?

SmartAndCleverRobot commented 2 years ago

done!I replaced python3.10 with python3.8 and torch-1.11 with torch-1.8.1