NVIDIA-Genomics-Research / AtacWorks

Deep learning based processing of Atac-seq data
https://clara-parabricks.github.io/AtacWorks/
Other
127 stars 23 forks source link

ConnectionResetError: [Errno 104] Connection reset by peer #246

Open hhvu0102 opened 2 years ago

hhvu0102 commented 2 years ago

Hello, I'm trying to run the example in tutorial 2. I downloaded the files and ran the exact command listed in the tutorial to denoise, but I always got ConnectionResetError: [Errno 104] Connection reset by peer error. This is the full error:

Traceback (most recent call last):
  File "/home/hhvu/.local/bin/atacworks", line 8, in <module>
    sys.exit(main())
  File "/home/hhvu/.local/lib/python3.7/site-packages/scripts/main.py", line 565, in main
    ngpus_per_node, args, res_queue), join=True)
  File "/home/hhvu/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/home/hhvu/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home/hhvu/.local/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/hhvu/.local/lib/python3.7/site-packages/scripts/worker.py", line 290, in infer_worker
    pad=args.pad)
  File "/home/hhvu/.local/lib/python3.7/site-packages/atacworks/dl4atac/infer.py", line 80, in infer
    res_queue.put((idxes, batch_res))
  File "<string>", line 2, in put
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/managers.py", line 834, i
n _callmethod
    raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError:
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/managers.py", line 234, i
n serve_client
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
  File "/home/hhvu/.local/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 284, in rebuild_storage_fd
    fd = df.detach()
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/reduction.py", line 185, in recv_handle
    return recvfds(s, 1)[0]
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/reduction.py", line 161, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata
---------------------------------------------------------------------------

Process Process-2:
Traceback (most recent call last):
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/hhvu/.local/lib/python3.7/site-packages/scripts/main.py", line 217, in writer
    if not res_queue.empty():
  File "<string>", line 2, in empty
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/managers.py", line 819, in _callmethod
    kind, result = conn.recv()
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/opt/rit/spack-app/linux-rhel7-x86_64/gcc-4.8.5/python-3.7.7-b5s6jni4fu45wd4rns43cetmu4u6grxz/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

I'm on a NVIDIA GeForce GTX 1080 machine with 4 gpus. I was able to run tutorial 1 successfully with this machine. I appreciate any help. Thank you!