Hello,
I'm trying to run the example in tutorial 2. I downloaded the files and ran the exact command listed in the tutorial to denoise, but I always got ConnectionResetError: [Errno 104] Connection reset by peer error. This is the full error:
Traceback (most recent call last):
File "/home/hhvu/.local/bin/atacworks", line 8, in <module>
sys.exit(main())
File "/home/hhvu/.local/lib/python3.7/site-packages/scripts/main.py", line 565, in main
ngpus_per_node, args, res_queue), join=True)
File "/home/hhvu/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/hhvu/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/hhvu/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/hhvu/.local/lib/python3.7/site-packages/scripts/worker.py", line 290, in infer_worker
pad=args.pad)
File "/home/hhvu/.local/lib/python3.7/site-packages/atacworks/dl4atac/infer.py", line 80, in infer
res_queue.put((idxes, batch_res))
File "<string>", line 2, in put
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/managers.py", line 834, i
n _callmethod
raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/managers.py", line 234, i
n serve_client
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "/home/hhvu/.local/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 284, in rebuild_storage_fd
fd = df.detach()
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/reduction.py", line 185, in recv_handle
return recvfds(s, 1)[0]
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/reduction.py", line 161, in recvfds
len(ancdata))
RuntimeError: received 0 items of ancdata
---------------------------------------------------------------------------
Process Process-2:
Traceback (most recent call last):
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/hhvu/.local/lib/python3.7/site-packages/scripts/main.py", line 217, in writer
if not res_queue.empty():
File "<string>", line 2, in empty
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/managers.py", line 819, in _callmethod
kind, result = conn.recv()
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
I'm on a NVIDIA GeForce GTX 1080 machine with 4 gpus. I was able to run tutorial 1 successfully with this machine.
I appreciate any help. Thank you!
Hello, I'm trying to run the example in tutorial 2. I downloaded the files and ran the exact command listed in the tutorial to denoise, but I always got
ConnectionResetError: [Errno 104] Connection reset by peer
error. This is the full error:I'm on a NVIDIA GeForce GTX 1080 machine with 4 gpus. I was able to run tutorial 1 successfully with this machine. I appreciate any help. Thank you!