aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
420 stars 179 forks source link

'dask' and '!pyscenic ctx' #120

Closed grimwoo closed 4 years ago

grimwoo commented 4 years ago

When I run

!pyscenic ctx adj.csv \
    {f_db_names} \
    --annotations_fname {f_motif_path} \
    --expression_mtx_fname {f_loom_path_unfilt} \
    --output Step6_reg.csv \
    --mask_dropouts \
    --num_workers 10

, it always gives me error.

As I read from a previous issue, this may be due to the module "dask". However, even though I tried half of historical versions of "dask", I still get the error information.

Error information with latest "dask" version is as following:

[                                        ] | 0% Completed |  1min 24.9s
Traceback (most recent call last):
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/bin/pyscenic", line 11, in <module>
    sys.exit(main())
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pyscenic/cli/pyscenic.py", line 408, in main
    args.func(args)
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pyscenic/cli/pyscenic.py", line 159, in prune_targets_command
    num_workers=args.num_workers)
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pyscenic/prune.py", line 351, in prune2df
    num_workers, module_chunksize)
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/pyscenic/prune.py", line 300, in _distributed_calc
    return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/dask/base.py", line 165, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/dask/base.py", line 436, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/dask/multiprocessing.py", line 215, in get
    **kwargs
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/dask/local.py", line 315, in reraise
    raise exc.with_traceback(tb)
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/dask/core.py", line 119, in _execute_task
    return func(*args2)
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/dask/dataframe/utils.py", line 653, in check_meta
    check_matching_columns(meta, x)
  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/dask/dataframe/utils.py", line 678, in check_matching_columns
    "  Missing: %s" % (extra, missing)
ValueError: The columns in the computed data do not match the columns in the provided metadata
  Extra:   []
  Missing: []

when I install lower version of "dask", the error would be like this (only sometimes, it could show "from dask ...."):

  File "/public-supool/home/wuhaoda/anaconda2/envs/Grim3.6.8/lib/python3.6/site-packages/distributed/config.py", line 11, in <module>
    config = dask.config.config
AttributeError: module 'dask' has no attribute 'config'
lucygarner commented 4 years ago

Hi,

I am also getting this error.

Traceback (most recent call last):
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/bin/pyscenic", line 8, in <module>
    sys.exit(main())
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/pyscenic/cli/pyscenic.py", line 420, in main
    args.func(args)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/pyscenic/cli/pyscenic.py", line 159, in prune_targets_command
    df_motifs = calc_func(dbs, modules, motif_annotations_fname,
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/pyscenic/prune.py", line 349, in prune2df
    return _distributed_calc(rnkdbs, modules, motif_annotations_fname, transformation_func, aggregation_func,
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/pyscenic/prune.py", line 300, in _distributed_calc
    return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/base.py", line 166, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/base.py", line 444, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/multiprocessing.py", line 208, in get
    result = get_async(
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/local.py", line 316, in reraise
    raise exc
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/dataframe/utils.py", line 655, in check_meta
    check_matching_columns(meta, x)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/dataframe/utils.py", line 680, in check_matching_columns
    raise ValueError(
ValueError: The columns in the computed data do not match the columns in the provided metadata
Order of columns does not match 

Was there a recommended solution?

Best, Lucy

cflerin commented 4 years ago

Hi @grimwoo , @lc822 , sorry for not replying to this earlier. This is a common issue with Dask versions. You can find suggestions to fix this in #163

BioAIEvolu commented 4 years ago

Hi,

I am also getting this error.

Traceback (most recent call last):
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/bin/pyscenic", line 8, in <module>
    sys.exit(main())
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/pyscenic/cli/pyscenic.py", line 420, in main
    args.func(args)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/pyscenic/cli/pyscenic.py", line 159, in prune_targets_command
    df_motifs = calc_func(dbs, modules, motif_annotations_fname,
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/pyscenic/prune.py", line 349, in prune2df
    return _distributed_calc(rnkdbs, modules, motif_annotations_fname, transformation_func, aggregation_func,
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/pyscenic/prune.py", line 300, in _distributed_calc
    return create_graph().compute(scheduler='processes', num_workers=num_workers if num_workers else cpu_count())
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/base.py", line 166, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/base.py", line 444, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/multiprocessing.py", line 208, in get
    result = get_async(
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/local.py", line 486, in get_async
    raise_exception(exc, tb)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/local.py", line 316, in reraise
    raise exc
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/local.py", line 222, in execute_task
    result = _execute_task(task, data)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/dataframe/utils.py", line 655, in check_meta
    check_matching_columns(meta, x)
  File "/data/user/lucy/py36-v1/conda-install/envs/pyscenic/lib/python3.8/site-packages/dask/dataframe/utils.py", line 680, in check_matching_columns
    raise ValueError(
ValueError: The columns in the computed data do not match the columns in the provided metadata
Order of columns does not match   

Was there a recommended solution?

Best, Lucy

Same problem here +1 👍

the package version is:

scanpy==1.4.4.post1 anndata==0.6.22.post1 umap==0.4.3 numpy==1.17.4 scipy==1.4.1 pandas==0.25.3 scikit-learn==0.23.1 statsmodels==0.11.1 pyscenic==0.10.0 dask=='2.17.2' distributed==2.11.0 pandas=='0.25.3' 

I have try pip install dask==1.0.0 distributed'>=1.21.6,<2.0.0' as the #163 said. but it doesn't work for me.

Then,I try the lastest version and the old version of pandas,just get the same error:

2020-06-04 11:25:13,096 - pyscenic.cli.pyscenic - INFO - Loading databases.

2020-06-04 11:25:13,096 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
[                                        ] | 0% Completed | 13.7s
(omit)
 File "/home/miniconda3/envs/ScCancer/lib/python3.8/site-packages/dask/dataframe/utils.py", line 680, in check_matching_columns
    raise ValueError(
ValueError: The columns in the computed data do not match the columns in the provided metadata
Order of columns does not match

2020-06-04 11:26:18,390 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for ZNF486 could be mapped to hg19-tss-centered-10kb-10species.mc9nr. Skipping this module.

2020-06-04 11:26:20,011 - pyscenic.transform - WARNING - Less than 80% of the genes in Regulon for ZNF492 could be mapped to hg19-tss-centered-10kb-10species.mc9nr. Skipping this module.

Is my Python version too new?