aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
435 stars 181 forks source link

pyarrow.lib.ArrowInvalid: Not a feather file #468

Open qiruicheng opened 1 year ago

qiruicheng commented 1 year ago

I am currently using pyscenic ctx for analysis, but I encountered the following error:

Traceback (most recent call last):
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/bin/pyscenic", line 8, in <module>
    sys.exit(main())
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/pyscenic/cli/pyscenic.py", line 675, in main
    args.func(args)
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/pyscenic/cli/pyscenic.py", line 230, in prune_targets_command
    num_workers=args.num_workers,
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/pyscenic/prune.py", line 410, in prune2df
    module_chunksize,
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/pyscenic/prune.py", line 334, in _distributed_calc
    scheduler='processes', num_workers=num_workers if num_workers else cpu_count()
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/dask/base.py", line 283, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/dask/base.py", line 565, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/dask/multiprocessing.py", line 230, in get
    **kwargs
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/dask/local.py", line 487, in get_async
    raise_exception(exc, tb)
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/dask/local.py", line 316, in reraise
    raise exc.with_traceback(tb)
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/dask/core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/pyscenic/transform.py", line 301, in modules2df
    [module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/pyscenic/transform.py", line 301, in <listcomp>
    [module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/pyscenic/transform.py", line 231, in module2df
    db, module, motif_annotations, weighted_recovery=weighted_recovery
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/pyscenic/transform.py", line 152, in module2features_auc1st_impl
    df = db.load(module)
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/ctxcore/rnkdb.py", line 318, in load
    gene_set = self.geneset.intersection(set(gs.genes))
  File "cytoolz/functoolz.pyx", line 475, in cytoolz.functoolz._memoize.__call__
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/ctxcore/rnkdb.py", line 98, in geneset
    return set(self.genes)
  File "cytoolz/functoolz.pyx", line 475, in cytoolz.functoolz._memoize.__call__
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/ctxcore/rnkdb.py", line 296, in genes
    reader = FeatherReader(self._fname)
  File "/home/qiruicheng/anaconda3/envs/scenic_protocol/lib/python3.6/site-packages/pyarrow/feather.py", line 40, in __init__
    self.open(source)
  File "pyarrow/feather.pxi", line 83, in pyarrow.lib.FeatherReader.open
  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Not a feather file

Command run when the error occurred:

f_loom_path_scenic=outputs/${sn}.loom

## outputs
grn_output=outputs/${sn}.adj.tsv
ctx_output=outputs_new/${sn}.reg.tsv
f_pyscenic_output=outputs_new/${sn}.pyscenic.loom

## reference
f_tfs=../cisTarget_databases/allTFs_mm.txt
# f_motif_path=../cisTarget_databases/motifs-v9-nr.mgi-m0.001-o0.0.tbl
# f_db_names=../cisTarget_databases/old_version/mm9-tss-centered-10kb-7species.mc9nr.feather
f_motif_path=../cisTarget_databases/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl
f_db_names=`find ../cisTarget_databases/ -name "mm10*10kbp*rankings.feather"`

#arboreto_with_multiprocessing.py \
#    $f_loom_path_scenic \
#    $f_tfs \
#    --method grnboost2 \
#    --output $grn_output \
#    --num_workers 20 \
#    --seed 777

pyscenic ctx \
    $grn_output \
    $f_db_names \
    --annotations_fname $f_motif_path \
    --expression_mtx_fname $f_loom_path_scenic \
    --output $ctx_output \
    --num_workers 10

I suspect that the issue may be with my feather file. I have tried downloading both the mm10 v10 versions of the feather files using the following links: (https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc_v10_clust/gene_based/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather)

However, even when I tried using the v9 version of the feather file, I still encountered the same error. (https://resources.aertslab.org/cistarget/databases/mus_musculus/mm10/refseq_r80/mc9nr/gene_based/mm10__refseq-r80__10kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather)

It is worth noting that when I used the previous mm9 version of the feather file, the program was able to run without any issues. (I used mm10 as a reference genome) Does this mean that I made a mistake while processing the mm10 version of the feather file? If anyone has any suggestions to help me solve this issue, please leave a comment below. Thank you!

Expected behavior A clear and concise description of what you expected to happen.

Please complete the following information:

xiepu969 commented 1 year ago

the same issue

qiruicheng commented 1 year ago

the same issue

This is probably the wrong version of python or package, this issue disappeared after I re-downloaded python=3.10 and pyscenic (my previous python version was 3.6)

ghuls commented 1 year ago

Latest provided Feather files are in v2 format (are in Arrow IPC format). and require pySCENIC >= 0.12.x

In the pySCENIC v0.12.0 release notes, you can see:

Only databases in Feather v2 format are supported now (ctxcore >= 0.2), which allow uses recent versions of pyarrow (>=8.0.0) instead of very old ones (<0.17). Databases in the new format can be downloaded from https://resources.aertslab.org/cistarget/databases/ and end with .genes_vs_motifs.rankings.feather or .genes_vs_tracks.rankings.feather.

hanjun98 commented 9 months ago

If you wanna use pySCENIC version: 0.11.2, you can use old version data in https://resources.aertslab.org/cistarget/databases/old