BASALT-2022-Karlsruhe / ka-basalt-2022

MIT License
0 stars 0 forks source link

Broken Pipe Error when training on full dataset #69

Closed lauritowal closed 1 year ago

lauritowal commented 1 year ago

I get the following error when training on the full dataset (which lies on wombat)

2022-09-25 02:42:03,503 Time: 1.05, Batches: 0, Avrg loss: 0.1861
2022-09-25 02:43:39,108 Time: 96.66, Batches: 100, Avrg loss: 7.7870
2022-09-25 02:45:11,174 Time: 188.72, Batches: 200, Avrg loss: 4.8232
2022-09-25 02:46:47,698 Time: 285.25, Batches: 300, Avrg loss: 4.6650
2022-09-25 02:48:26,269 Time: 383.82, Batches: 400, Avrg loss: 4.4562
2022-09-25 02:50:03,362 Time: 480.91, Batches: 500, Avrg loss: 4.3550
2022-09-25 02:51:35,792 Time: 573.34, Batches: 600, Avrg loss: 4.2437
2022-09-25 02:52:59,796 Time: 657.34, Batches: 700, Avrg loss: 4.2581
2022-09-25 02:54:36,283 Time: 753.83, Batches: 800, Avrg loss: 4.2259
2022-09-25 02:56:13,565 Time: 851.11, Batches: 900, Avrg loss: 4.1023
2022-09-25 02:57:49,973 Time: 947.52, Batches: 1000, Avrg loss: 4.1106
2022-09-25 02:59:26,497 Time: 1044.04, Batches: 1100, Avrg loss: 4.1556
2022-09-25 03:01:03,209 Time: 1140.76, Batches: 1200, Avrg loss: 4.2170
2022-09-25 03:02:41,089 Time: 1238.64, Batches: 1300, Avrg loss: 4.1900
2022-09-25 03:04:19,740 Time: 1337.29, Batches: 1400, Avrg loss: 4.2176
2022-09-25 03:05:59,382 Time: 1436.93, Batches: 1500, Avrg loss: 4.2210
2022-09-25 03:07:38,463 Time: 1536.01, Batches: 1600, Avrg loss: 4.1511
2022-09-25 03:09:15,217 Time: 1632.76, Batches: 1700, Avrg loss: 4.0200
2022-09-25 03:10:52,003 Time: 1729.55, Batches: 1800, Avrg loss: 4.0430
2022-09-25 03:12:30,811 Time: 1828.36, Batches: 1900, Avrg loss: 4.1787
2022-09-25 03:14:06,926 Time: 1924.47, Batches: 2000, Avrg loss: 3.9809
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
2022-09-25 03:14:13,431 End training
Traceback (most recent call last):
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/queues.py", line 245, in _feed
    send_bytes(obj)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes
    self._send(header + buf)
  File "/home/aicrowd/.conda/envs/minerl/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
lauritowal commented 1 year ago

TODO: Check out the following https://www.geeksforgeeks.org/broken-pipe-error-in-python/

lauritowal commented 1 year ago

answer from BASALT Support 5 October 21:11:

Miffyli — Today at 8:59 PM Huuh... That would mean some of the processes just died outright. Have you checked your memory usage and if it approaches using all memory? walt — Today at 9:02 PM Memory for CPUs and GPU was actually fine. It does not approach using all memory... Around half was used Miffyli — Today at 9:03 PM Hm... Was 2000 batches the limit? The closing code might be a bit derp, where the main thread just quits at the 2000 batches and others crash. I thought I had more cleaner closing for that tho walt — Today at 9:04 PM yes. 2000 was the limit: MAX_BATCHES = 2000 if USING_FULL_DATASET else int(1e9) Miffyli — Today at 9:05 PM ah yup that probably is the reason. The run finished fine, it just did not clean up the threads as cleanly as I thought it would 😅 walt — Today at 9:06 PM ah yup that probably is the reason. The run finished fine, it just did not clean up the threads as cleanly as I thought it would 😅 mm but the training stops though after a few times when that appears Miffyli — Today at 9:07 PM Yeap, but that is expected, as it did reach 2000 batches. But yes, if those processes crash like that, the main code will probably hang, trying to join the processes and just halt there 😅. You could add a timeout to the join operation as a quick fix, and if timeout passes, catch the exception with try-except block. Yes, not a real fix and I should look into, but I can not promise when I have time walt — Today at 9:09 PM Alright, thanks a lot!!

lauritowal commented 1 year ago

We have set max batch size to 1e9 and do safe the models after each 1000 batches. The error does not appear anymore