Open dmalzl opened 3 months ago
looking through the code I found that this can be fixed by just running the function in local mode by passing client_or_address = 'custom_multiprocessing'
which will then prompt prune2df use pd.concat
for aggregating results instead of dask.from_delayed
and thus bypasses the underlying problem (see statement here). However, make sure you also pass the right number of cpus you want to use via num_workers
or your machine might be spammed with concurrent processes.
I encountered the same error when running pyscenic ctx on the command line. After a series of runs I found that there was probably a problem with the dask, so I removed the --mode "dask_multiprocessing" parameter and it worked fine. I guess this is probably a dask environment configuration issue.
Describe the bug I am following the description of the full interactive pipeline as detailed in this notebook and having trouble running the pruning stage. Using pySCENIC v0.12.1 (installed from source since PyPI package is broken) I get an error associated with dask when running
prune2df
and I could not find any related issue in the issue section of this repo.Steps to reproduce the behavior
from arboreto.utils import load_tf_names from arboreto.algo import grnboost2
from ctxcore.rnkdb import FeatherRankingDatabase as RankingDatabase from pyscenic.utils import modules_from_adjacencies from pyscenic.prune import prune2df, df2regulons from pyscenic.aucell import aucell
adata = ad.read_h5ad()
with open('../scenic_resource/hs_hgnc_tfs.txt', 'r') as tf_file: tf_names = [line.rstrip() for line in tf_file]
cistarget_db = RankingDatabase( '../scenic_resource/hg38refseq-r8010kb_up_and_down_tss.mc9nr.genes_vs_motifs.rankings.feather', 'hg38refseq-r8010kb_up_and_down_tss.mc9nr' )
manually restrict number of workers used
client = Client( LocalCluster( name='grn_call', n_workers=8, threads_per_worker=1 ) )
adjacencies = grnboost2( expression_data = adata.to_df('counts'), # convert anndata to pandas.DataFrame tf_names = tf_names, client_or_address = client, verbose = True )
inferred_modules = list( modules_from_adjacencies( adjacencies, adata.to_df('counts') ) )
this is actually executed as part of a dict comprehension
because I am computing GRNs for multiple datasets
but the error also occurs in when running it like this
so I kept the code like this for brevity
prune2df( [db], inferred_modules, '../scenic_resource/motifs-v9-nr.hgnc-m0.001-o0.0.tbl', client_or_address = client )
Expected behavior Expected behaviour is simply that it runs without any error as all the passed arguments comply to the types inferred from the above mentioned notebook.
Please complete the following information:
pip freeze
,conda list
, or skip this if using Docker/Singularity]: