aertslab / pySCENIC

pySCENIC is a lightning-fast python implementation of the SCENIC pipeline (Single-Cell rEgulatory Network Inference and Clustering) which enables biologists to infer transcription factors, gene regulatory networks and cell types from single-cell RNA-seq data.
http://scenic.aertslab.org
GNU General Public License v3.0
420 stars 179 forks source link

CLI ctx fails on "dask_cluster" mode[BUG] #462

Closed carlos-a-enriquez closed 1 year ago

carlos-a-enriquez commented 1 year ago

Describe the bug First of all, I have been running pySCENIC in a distributed cloud environment, not in an HPC environment. My issue occurs when running ctx in the "dask_cluster" mode, which is the only mode suitable for my underlying infrastructure (custom_multiprocessing or even dask_multiprocessing would not allow me to use my resources efficiently).

The source of the bug can be easily identified in this line of code: https://github.com/aertslab/pySCENIC/blob/master/src/pyscenic/cli/pyscenic.py#L243, where args.mode is passed as client_or_address to prune2df().

What is the issue with this? The issue occurs when I choose the "dask_cluster" mode, which would naturally require me to pass my Dask cluster's IP as an extra CLI argument, which would be client_or_address. However, since args.mode is, for some reason, passed as prune2df()'s client_or_address keyword argument, the "dask_cluster" string is obviously rejected since it is not a valid IP address.

The solution would be to switch args.mode with args.client_or_address in this particular line (src/pyscenic/cli/pyscenic.py#L243).

Steps to reproduce the behavior

  1. Command run when the error occurred:

    !pyscenic ctx {f_adj_csv} \
    {f_db_names} \
    --annotations_fname {MOTIF_ANNOTATIONS_FNAME} \
    --expression_mtx_fname {f_loom_path_scenic} \
    --output {f_motifs_csv} \
    --mask_dropouts \
    --mode "dask_cluster" \
    --client_or_address {dask_scheduler_IP}
  2. Error encountered:

    2023-03-15 10:51:11,205 - pyscenic.cli.pyscenic - INFO - Calculating regulons.
    Traceback (most recent call last):
    File "/opt/conda/bin/pyscenic", line 8, in <module>
    sys.exit(main())
    File "/opt/conda/lib/python3.8/site-packages/pyscenic/cli/pyscenic.py", line 713, in main
    args.func(args)
    File "/opt/conda/lib/python3.8/site-packages/pyscenic/cli/pyscenic.py", line 236, in prune_targets_command
    df_motifs = calc_func(
    File "/opt/conda/lib/python3.8/site-packages/pyscenic/prune.py", line 424, in prune2df
    return _distributed_calc(
    File "/opt/conda/lib/python3.8/site-packages/pyscenic/prune.py", line 205, in _distributed_calc
    assert is_valid(
    AssertionError: "dask_cluster"is not valid for parameter client_or_address.

Expected behavior I expect {dask_scheduler_IP} to be provided to prune2df() as the corresponding client_or_address argument, not the "dask_cluster" string.

Please complete the following information:

ghuls commented 1 year ago

It should be fixed in master now: https://github.com/aertslab/pySCENIC/commit/b9d3609fb8d0c302564ad4274c2ebb679505cbc7