aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
439 stars 181 forks source link

when use the nextflow pipeline, get a error:pyarrow.lib.ArrowInvalid: Not a feather file #488

Open daweibushen opened 1 year ago

daweibushen commented 1 year ago

Describe the bug executor > local (7) [31/6d1912] process > filter [100%] 1 of 1 ✔ [e7/4eb506] process > preprocess [100%] 1 of 1 ✔ [ba/6f6b33] process > pca [100%] 1 of 1 ✔ [19/243d25] process > visualize [100%] 1 of 1 ✔ [dd/034256] process > cluster [100%] 1 of 1 ✔ [4f/036523] process > GRNinference [100%] 1 of 1 ✔ [2a/29af41] process > cisTarget [ 0%] 0 of 1 [- ] process > AUCell - [- ] process > visualizeAUC - [- ] process > integrateOutput - Error executing process > 'cisTarget'

Caused by: Process cisTarget terminated with an error exit status (1)

Command executed:

pyscenic ctx adj.tsv hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather --annotations_fname motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl --expression_mtx_fname filtered.loom --cell_id_attribute CellID --gene_attribute Gene --mode "dask_multiprocessing" --output reg.csv --num_workers 20

Command exit status: 1

Command output: [ ] | 0% Completed | 13.9s [ ] | 0% Completed | 14.0s [ ] | 0% Completed | 14.1s [ ] | 0% Completed | 14.2s [ ] | 0% Completed | 14.3s [ ] | 0% Completed | 14.4s [ ] | 0% Completed | 14.6s [ ] | 0% Completed | 14.7s [ ] | 0% Completed | 14.8s [ ] | 0% Completed | 14.9s [ ] | 0% Completed | 15.0s [ ] | 0% Completed | 15.1s [ ] | 0% Completed | 15.2s [ ] | 0% Completed | 15.4s [ ] | 0% Completed | 15.5s [ ] | 0% Completed | 15.6s [ ] | 0% Completed | 15.7s [ ] | 0% Completed | 15.8s [ ] | 0% Completed | 15.9s [ ] | 0% Completed | 16.0s [ ] | 0% Completed | 16.1s [ ] | 0% Completed | 16.2s [ ] | 0% Completed | 16.4s [ ] | 0% Completed | 16.5s [ ] | 0% Completed | 16.6s [ ] | 0% Completed | 16.7s [ ] | 0% Completed | 16.9s [ ] | 0% Completed | 17.0s [ ] | 0% Completed | 17.1s [ ] | 0% Completed | 17.2s [ ] | 0% Completed | 17.3s [ ] | 0% Completed | 17.4s [ ] | 0% Completed | 17.5s [ ] | 0% Completed | 17.6s [ ] | 0% Completed | 17.8s [ ] | 0% Completed | 17.9s [ ] | 0% Completed | 18.0s [ ] | 0% Completed | 18.1s [ ] | 0% Completed | 18.2s [ ] | 0% Completed | 18.3s [ ] | 0% Completed | 18.4s [ ] | 0% Completed | 18.5s [ ] | 0% Completed | 18.6s [ ] | 0% Completed | 18.8s [ ] | 0% Completed | 18.9s [ ] | 0% Completed | 19.0s [ ] | 0% Completed | 19.1s [ ] | 0% Completed | 19.2s [ ] | 0% Completed | 19.3s [ ] | 0% Completed | 19.4s

Command error:

2023-07-07 17:27:29,158 - pyscenic.utils - INFO - Creating modules.

2023-07-07 17:29:22,365 - pyscenic.cli.pyscenic - INFO - Loading databases.

2023-07-07 17:29:22,366 - pyscenic.cli.pyscenic - INFO - Calculating regulons. Traceback (most recent call last): File "/opt/venv/bin/pyscenic", line 8, in sys.exit(main()) File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 408, in main args.func(args) File "/opt/venv/lib/python3.7/site-packages/pyscenic/cli/pyscenic.py", line 159, in prune_targets_command num_workers=args.num_workers) File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 351, in prune2df num_workers, module_chunksize) File "/opt/venv/lib/python3.7/site-packages/pyscenic/prune.py", line 300, in _distributed_calc return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count()) File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 156, in compute (result,) = compute(self, traverse=False, kwargs) File "/opt/venv/lib/python3.7/site-packages/dask/base.py", line 397, in compute results = schedule(dsk, keys, kwargs) File "/opt/venv/lib/python3.7/site-packages/dask/multiprocessing.py", line 192, in get raise_exception=reraise, *kwargs) File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 501, in get_async raise_exception(exc, tb) File "/opt/venv/lib/python3.7/site-packages/dask/compatibility.py", line 111, in reraise raise exc.with_traceback(tb) File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 272, in execute_task result = _execute_task(task, data) File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 252, in _execute_task args2 = [_execute_task(a, cache) for a in args] File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 252, in args2 = [_execute_task(a, cache) for a in args] File "/opt/venv/lib/python3.7/site-packages/dask/local.py", line 253, in _execute_task return func(args2) File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in modules2df for module in modules]) File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 231, in for module in modules]) File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 185, in module2df weighted_recovery=weighted_recovery) File "/opt/venv/lib/python3.7/site-packages/pyscenic/transform.py", line 123, in module2features_auc1st_impl df = db.load(module) File "/opt/venv/lib/python3.7/site-packages/pyscenic/rnkdb.py", line 259, in load return FeatherReader(self._fname).read_pandas(columns=(INDEX_NAME,) + gs.genes).set_index(INDEX_NAME) File "/opt/venv/lib/python3.7/site-packages/pyarrow/feather.py", line 40, in init self.open(source) File "pyarrow/feather.pxi", line 83, in pyarrow.lib.FeatherReader.open File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Not a feather file A clear and concise description of what the bug is.

Mote that most errors are due to the input from the user, and therefore should be treated as questions in the Discussions. Please, pyscenic ctx adj.tsv hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather --annotations_fname motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl --expression_mtx_fname filtered.loom --cell_id_attribute CellID --gene_attribute Gene --mode "dask_multiprocessing" --output reg.csv --num_workers 20only report them as bugs if you are quite certain that they are not behaving as expected.

Steps to reproduce the behavior

  1. Command run when the error occurred:

    ...
  2. Error encountered:

    ...

Expected behavior A clear and concise description of what you expected to happen.

Please complete the following information:

ghuls commented 1 year ago

Could you check if your downloaded Feather database is not corrupted. Go to the dir where you have hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather:

wget https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather.sha1sum.txt

sha1sum -c hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather.sha1sum.txt

If the download is not OK, remove the current hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather file and redownload:

wget https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather

Also using the docker/podman/singularity/apptainer packed version of pySCENIC is recommended: https://pyscenic.readthedocs.io/en/latest/installation.html#docker-podman-and-singularity-apptainer-images