aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
178 stars 28 forks source link

run_pycistarget DEM error #258

Open avdl25 opened 10 months ago

avdl25 commented 10 months ago

Hello, In running one of my samples through SCENIC+, I am running into an error in the run_pycistarget wrapper function, specifically at running DEM without promoters for topics_otsu where my value is not 2D: ValueError: blocks must be 2-D. It seems this error was described in the pycistarget github errors, but I am unable to find a solution.

To reproduce the error:

from scenicplus.wrappers.run_pycistarget import run_pycistarget
ray.shutdown()
run_pycistarget(
    region_sets = region_sets,
    species = 'homo_sapiens',
    save_path = os.path.join(work_dir, 'motifs'),
    ctx_db_path = rankings_db,
    dem_db_path = scores_db,
    path_to_motif_annotations = motif_annotation,
    run_without_promoters = True,
    n_cpu = 1,
    _temp_dir = os.path.join('/home/jupyter/', 'ray_spill'),
    annotation_version = 'v10nr_clust',
      ##)

This error occurs after I successfully get through loading cisTarget database for topics_otsu, reading cisTarget database, running cisTarget for topics_otsu, running cisTarget without promoters for topics_otsu, and running DEM for topics_otsu.

Abridged log information:

 ... 
2023-11-13 18:26:26,616 pycisTarget_wrapper INFO     Running DEM without promoters for topics_otsu
2023-11-13 18:26:36,225 DEM          INFO     Creating contrast groups
2023-11-13 18:26:38,746 DEM          INFO     Computing DEM for Topic1
2023-11-13 18:26:51,340 DEM          INFO     Computing DEM for Topic2
2023-11-13 18:27:22,553 DEM          INFO     Computing DEM for Topic3
2023-11-13 18:27:29,403 DEM          INFO     Computing DEM for Topic4
2023-11-13 18:27:36,245 DEM          INFO     Computing DEM for Topic5
2023-11-13 18:27:43,135 DEM          INFO     Computing DEM for Topic6
2023-11-13 18:27:53,196 DEM          INFO     Computing DEM for Topic7
2023-11-13 18:28:00,522 DEM          INFO     Computing DEM for Topic8
2023-11-13 18:28:17,042 DEM          INFO     Computing DEM for Topic9
2023-11-13 18:28:23,456 DEM          INFO     Computing DEM for Topic10
2023-11-13 18:28:35,146 DEM          INFO     Computing DEM for Topic11
2023-11-13 18:28:42,845 DEM          INFO     Computing DEM for Topic12
2023-11-13 18:28:54,132 DEM          INFO     Computing DEM for Topic13

Entire output:

ValueError                                Traceback (most recent call last)
Cell In[17], line 3
      1 from scenicplus.wrappers.run_pycistarget import run_pycistarget
      2 ray.shutdown()
----> 3 run_pycistarget(
      4     region_sets = region_sets,
      5     species = 'homo_sapiens',
      6     save_path = os.path.join(work_dir, 'motifs'),
      7     ctx_db_path = rankings_db,
      8     dem_db_path = scores_db,
      9     path_to_motif_annotations = motif_annotation,
     10     run_without_promoters = True,
     11     n_cpu = 1,
     12     _temp_dir = os.path.join('/home/jupyter/', 'ray_spill'),
     13     annotation_version = 'v10nr_clust',
     14     )

File ~/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:300, in run_pycistarget(region_sets, species, save_path, custom_annot, save_partial, ctx_db_path, dem_db_path, run_without_promoters, biomart_host, promoter_space, ctx_auc_threshold, ctx_nes_threshold, ctx_rank_threshold, dem_log2fc_thr, dem_motif_hit_thr, dem_max_bg_regions, annotation, motif_similarity_fdr, path_to_motif_annotations, annotation_version, n_cpu, _temp_dir, exclude_motifs, exclude_collection, **kwargs)
    298 db_regions = set(pd.concat([dem_db.regions_to_db[x] for x in dem_db.regions_to_db.keys()])['Query'])
    299 dem_db.regions_to_db = {x: target_to_query(regions_np[x], list(db_regions), fraction_overlap = 0.4) for x in regions_np.keys()}
--> 300 menr['DEM_'+key+'_No_promoters'] = DEM(dem_db = dem_db,
    301                region_sets = regions_np,
    302                log2fc_thr = dem_log2fc_thr,
    303                motif_hit_thr = dem_motif_hit_thr,
    304                max_bg_regions = dem_max_bg_regions,
    305                specie = species,
    306                promoter_space = promoter_space,
    307                motif_annotation = annotation,
    308                motif_similarity_fdr = motif_similarity_fdr, 
    309                path_to_motif_annotations = path_to_motif_annotations,
    310                n_cpu = n_cpu,
    311                annotation_version = annotation_version,
    312                tmp_dir = save_path,
    313                _temp_dir= _temp_dir,
    314                **kwargs)
    315 out_folder = os.path.join(save_path,'DEM_'+key+'_No_promoters')
    316 check_folder = os.path.isdir(out_folder)

File ~/.local/lib/python3.10/site-packages/pycistarget/motif_enrichment_dem.py:319, in DEM.__init__(self, dem_db, region_sets, specie, subset_motifs, contrasts, name, max_bg_regions, adjpval_thr, log2fc_thr, mean_fg_thr, motif_hit_thr, n_cpu, fraction_overlap, cluster_buster_path, path_to_genome_fasta, path_to_motifs, genome_annotation, promoter_space, path_to_motif_annotations, annotation_version, motif_annotation, motif_similarity_fdr, orthologous_identity_threshold, tmp_dir, **kwargs)
    317 self.cistromes = None
    318 if dem_db is not None:
--> 319     self.run(dem_db.db_scores, **kwargs)

File ~/.local/lib/python3.10/site-packages/pycistarget/motif_enrichment_dem.py:395, in DEM.run(self, dem_db_scores, **kwargs)
    393     sys.stderr = sys.__stderr__ 
    394 else:
--> 395     DEM_list = [DEM_internal(dem_db_scores,
    396                         region_groups[i],
    397                         contrasts_names[i],
    398                         adjpval_thr = self.adjpval_thr,
    399                         log2fc_thr = self.log2fc_thr,
    400                         mean_fg_thr = self.mean_fg_thr,
    401                         motif_hit_thr = self.motif_hit_thr) for i in range(len(contrasts))]
    402 self.motif_enrichment = {contrasts_names[i]: DEM_list[i][0] for i in range(len(DEM_list))} 
    403 db_motif_hits =  {contrasts_names[i]: DEM_list[i][1] for i in range(len(DEM_list))} 

File ~/.local/lib/python3.10/site-packages/pycistarget/motif_enrichment_dem.py:395, in <listcomp>(.0)
    393     sys.stderr = sys.__stderr__ 
    394 else:
--> 395     DEM_list = [DEM_internal(dem_db_scores,
    396                         region_groups[i],
    397                         contrasts_names[i],
    398                         adjpval_thr = self.adjpval_thr,
    399                         log2fc_thr = self.log2fc_thr,
    400                         mean_fg_thr = self.mean_fg_thr,
    401                         motif_hit_thr = self.motif_hit_thr) for i in range(len(contrasts))]
    402 self.motif_enrichment = {contrasts_names[i]: DEM_list[i][0] for i in range(len(DEM_list))} 
    403 db_motif_hits =  {contrasts_names[i]: DEM_list[i][1] for i in range(len(DEM_list))} 

File ~/.local/lib/python3.10/site-packages/pycistarget/motif_enrichment_dem.py:669, in DEM_internal(***failed resolving arguments***)
    667 keep_motifs = motif_df.index.tolist()
    668 keep_motifs_index = get_position_index(keep_motifs, motifs)
--> 669 scores_mat = sparse.vstack([fg_mat[keep_motifs_index,].T, bg_mat[keep_motifs_index,].T], format='csr').T
    670 regions = region_group[0] + ['Bg']*bg_mat.shape[1]
    671 labels = [1]*fg_mat.shape[1] + [0]*bg_mat.shape[1]

File ~/.local/lib/python3.10/site-packages/scipy/sparse/_construct.py:569, in vstack(blocks, format, dtype)
    538 def vstack(blocks, format=None, dtype=None):
    539     """
    540     Stack sparse matrices vertically (row wise)
    541 
   (...)
    567 
    568     """
--> 569     return bmat([[b] for b in blocks], format=format, dtype=dtype)
File ~/.local/lib/python3.10/site-packages/scipy/sparse/_construct.py:618, in bmat(blocks, format, dtype)
    615 blocks = np.asarray(blocks, dtype='object')
    617 if blocks.ndim != 2:
--> 618     raise ValueError('blocks must be 2-D')
    620 M,N = blocks.shape
    622 # check for fast path cases
ValueError: blocks must be 2-D

Version info:

Thank you for the help!

avdl25 commented 10 months ago

Update: I was able to work around this issue by removing the topic that was causing an error from the region_sets['topics_otsu'] with region_sets['topics_otsu'].pop('Topic13') because it wasn't a topic important to my cell type of interest; however, any advice on how to fix this issue for the future would be much appreciated.

SeppeDeWinter commented 10 months ago

Hi @avdl25

Can you check wether you still have the same issue with the development branch of the SCENIC+ / pycistarget code?

All the best, Seppe