aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
439 stars 181 forks source link

pyscenic ctx terminates: pyarrow.lib.ArrowInvalid: Not a feather file #452

Closed northNomad closed 1 year ago

northNomad commented 1 year ago

Hi pySCENIC team,

I've downsampled my seurat object to 500 cells and created my loom file using SCopeLoomR. The pyscenic grn ran successfully. But when I run pyscenic ctx the command terminates in the calculating regulon step.

  1. Command run when the error occurred:

    pyscenic ctx bm500_GRN_adjancies.csv \
    mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    --annotations_fname motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl.txt \
    --expression_mtx_fname bm500_test.loom \
    --output bm500_20221220_CTX_regulons.csv \
    --mask_dropouts \
    --num_workers 8
  2. Error encountered:

2022-12-20 15:19:36,395 - pyscenic.cli.pyscenic - INFO - Creating modules.

2022-12-20 15:19:37,645 - pyscenic.cli.pyscenic - INFO - Loading expression matrix.

2022-12-20 15:19:38,492 - pyscenic.utils - INFO - Calculating Pearson correlations.

2022-12-20 15:19:39,307 - pyscenic.utils - WARNING - Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI.
        Dropout masking is currently set to [True].

2022-12-20 15:19:52,555 - pyscenic.utils - INFO - Creating modules.

2022-12-20 15:22:09,398 - pyscenic.cli.pyscenic - INFO - Loading databases.

2022-12-20 15:22:09,401 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
[                                        ] | 0% Completed | 21.6s
Traceback (most recent call last):
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\User\miniconda3\envs\scenic_protocol\Scripts\pyscenic.exe\__main__.py", line 7, in <module>
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\pyscenic\cli\pyscenic.py", line 675, in main
    args.func(args)
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\pyscenic\cli\pyscenic.py", line 230, in prune_targets_command
    num_workers=args.num_workers,
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\pyscenic\prune.py", line 410, in prune2df
    module_chunksize,
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\pyscenic\prune.py", line 334, in _distributed_calc
    scheduler='processes', num_workers=num_workers if num_workers else cpu_count()
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\dask\base.py", line 283, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\dask\base.py", line 565, in compute
    results = schedule(dsk, keys, **kwargs)
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\dask\multiprocessing.py", line 230, in get
    **kwargs
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\dask\local.py", line 487, in get_async
    raise_exception(exc, tb)
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\dask\local.py", line 316, in reraise
    raise exc.with_traceback(tb)
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\dask\local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\dask\core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\dask\core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\dask\core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\pyscenic\transform.py", line 301, in modules2df
    [module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\pyscenic\transform.py", line 301, in <listcomp>
    [module2df(db, module, motif_annotations, weighted_recovery, False, module2features_func) for module in modules]
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\pyscenic\transform.py", line 231, in module2df
    db, module, motif_annotations, weighted_recovery=weighted_recovery
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\pyscenic\transform.py", line 152, in module2features_auc1st_impl
    df = db.load(module)
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\ctxcore\rnkdb.py", line 318, in load
    gene_set = self.geneset.intersection(set(gs.genes))
  File "cytoolz\functoolz.pyx", line 475, in cytoolz.functoolz._memoize.__call__
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\ctxcore\rnkdb.py", line 98, in geneset
    return set(self.genes)
  File "cytoolz\functoolz.pyx", line 475, in cytoolz.functoolz._memoize.__call__
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\ctxcore\rnkdb.py", line 296, in genes
    reader = FeatherReader(self._fname)
  File "C:\Users\User\miniconda3\envs\scenic_protocol\lib\site-packages\pyarrow\feather.py", line 40, in __init__
    self.open(source)
  File "pyarrow\feather.pxi", line 83, in pyarrow.lib.FeatherReader.open
  File "pyarrow\error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Not a feather file

I re-installed pyarrow but the issue persisted.

Wondering if anyone has encountered similar issues. Many thanks in advance -

northNomad commented 1 year ago

Seems to be a windows issue. Ran the same code on a linux system and saw no error.