aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
424 stars 179 forks source link

ERROR - Error in Nanny killing Worker subprocess #558

Closed ASNbioinf closed 3 months ago

ASNbioinf commented 3 months ago

When I run pySCENIC with Docker/Podman, pyscenic grn returns the error "Error in Nanny killing Worker subprocess".

I can't understand if it's because I put --rm as the argument. It automatically removes the container once it exists.

Moreover, the problem was apparently resolved below by the explanation "The above exception was the direct cause of the following exception... ".

It finally wrote the results into the file, but I'm not sure if they are correct or if I got some information wrong because of the error.

Help me understand the error so I can figure out how I need to run the analysis again.

NB: I used 150GB and 20 CPU

docker run -it --rm \
    -v /data:/data \
    aertslab/pyscenic:0.12.1 pyscenic grn \
        --num_workers 20 \
        -o /data/expr_mat.adjacencies.tsv \
        /data/expr_mat.loom \
        /data/allTFs_hg38.txt
024-06-25 21:09:47,005 - pyscenic.cli.pyscenic - INFO - Inferring regulatory networks.
2024-06-26 16:12:16,777 - distributed.nanny - WARNING - Worker process still alive after 3.1999992370605472 seconds, killing
2024-06-26 16:12:16,778 - distributed.nanny - WARNING - Worker process still alive after 3.1999989318847657 seconds, killing
2024-06-26 16:12:16,778 - distributed.nanny - WARNING - Worker process still alive after 3.199999389648438 seconds, killing
2024-06-26 16:12:16,778 - distributed.nanny - WARNING - Worker process still alive after 3.1999995422363288 seconds, killing
2024-06-26 16:12:16,779 - distributed.nanny - WARNING - Worker process still alive after 3.199999389648438 seconds, killing
2024-06-26 16:12:16,779 - distributed.nanny - WARNING - Worker process still alive after 3.199999389648438 seconds, killing
2024-06-26 16:12:16,779 - distributed.nanny - WARNING - Worker process still alive after 3.1999992370605472 seconds, killing
2024-06-26 16:12:16,779 - distributed.nanny - WARNING - Worker process still alive after 3.199999389648438 seconds, killing
2024-06-26 16:12:16,779 - distributed.nanny - WARNING - Worker process still alive after 3.1999990844726565 seconds, killing
2024-06-26 16:12:16,779 - distributed.nanny - WARNING - Worker process still alive after 3.1999986267089846 seconds, killing
2024-06-26 16:12:16,780 - distributed.nanny - WARNING - Worker process still alive after 3.1999990844726565 seconds, killing
2024-06-26 16:12:16,780 - distributed.nanny - WARNING - Worker process still alive after 3.1999992370605472 seconds, killing
2024-06-26 16:12:16,780 - distributed.nanny - WARNING - Worker process still alive after 3.1999989318847657 seconds, killing
2024-06-26 16:12:16,782 - distributed.nanny - WARNING - Worker process still alive after 3.1999992370605472 seconds, killing
2024-06-26 16:12:16,782 - distributed.nanny - WARNING - Worker process still alive after 3.199998474121094 seconds, killing
2024-06-26 16:12:16,782 - distributed.nanny - WARNING - Worker process still alive after 3.199999389648438 seconds, killing
2024-06-26 16:12:16,782 - distributed.nanny - WARNING - Worker process still alive after 3.1999992370605472 seconds, killing
2024-06-26 16:12:17,493 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 580, in close
await self.kill(timeout=timeout)
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 372, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 781, in kill
await process.join(max(0, deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/process.py", line 304, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2024-06-26 16:12:17,495 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 580, in close
await self.kill(timeout=timeout)
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 372, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 781, in kill
await process.join(max(0, deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/process.py", line 304, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2024-06-26 16:12:17,496 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 580, in close
await self.kill(timeout=timeout)
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 372, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 781, in kill
await process.join(max(0, deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/process.py", line 304, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2024-06-26 16:12:17,498 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 580, in close
await self.kill(timeout=timeout)
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 372, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 781, in kill
await process.join(max(0, deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/process.py", line 304, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2024-06-26 16:12:17,499 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 580, in close
await self.kill(timeout=timeout)
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 372, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 781, in kill
await process.join(max(0, deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/process.py", line 304, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2024-06-26 16:12:17,503 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 580, in close
await self.kill(timeout=timeout)
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 372, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 781, in kill
await process.join(max(0, deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/process.py", line 304, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2024-06-26 16:12:17,542 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 580, in close
await self.kill(timeout=timeout)
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 372, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 781, in kill
await process.join(max(0, deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/process.py", line 304, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2024-06-26 16:12:17,545 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 580, in close
await self.kill(timeout=timeout)
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 372, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 781, in kill
await process.join(max(0, deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/process.py", line 304, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2024-06-26 16:12:17,568 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 580, in close
await self.kill(timeout=timeout)
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 372, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 781, in kill
await process.join(max(0, deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/process.py", line 304, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
2024-06-26 16:12:17,570 - distributed.nanny - ERROR - Error in Nanny killing Worker subprocess
Traceback (most recent call last):
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 580, in close
await self.kill(timeout=timeout)
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 372, in kill
await self.process.kill(timeout=0.8 * (deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/nanny.py", line 781, in kill
await process.join(max(0, deadline - time()))
File "/opt/venv/lib/python3.10/site-packages/distributed/process.py", line 304, in join
await asyncio.wait_for(asyncio.shield(self._exit_future), timeout)
File "/usr/local/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError

2024-06-26 16:12:17,576 - pyscenic.cli.pyscenic - INFO - Writing results to file.
preparing dask client
parsing input
creating dask graph
20 partitions
computing dask graph
not shutting down client, client was created externally
finished
ghuls commented 3 months ago

Can you try with less concurrent workers. In general errors like this happen if you ran out of RAM and one of the subprocesses was killed.