aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
187 stars 29 forks source link

Run GSEA hanging for hours #274

Open rogercasalsfr opened 11 months ago

rogercasalsfr commented 11 months ago

Discussed in https://github.com/aertslab/scenicplus/discussions/273

Originally posted by **rogercasalsfr** December 18, 2023 Hi everyone, I am struggling to get the last step running scenicplus which is when is GSEA running. My scplus_obj is: ```SCENIC+ object with n_cells x n_genes = 7830 x 22787 and n_cells x n_regions = 7830 x 133660 metadata_regions:'Chromosome', 'Start', 'End', 'Width', 'cisTopic_nr_frag', 'cisTopic_log_nr_frag', 'cisTopic_nr_acc', 'cisTopic_log_nr_acc' metadata_cell:'GEX_obs_id', 'GEX_cell_type', 'GEX_donor_id', 'GEX_louvain', 'ACC_cisTopic_nr_frag', 'ACC_cisTopic_log_nr_frag', 'ACC_cisTopic_nr_acc', 'ACC_cisTopic_log_nr_acc', 'ACC_sample_id', 'ACC_cell_type' menr:'CTX_topics_otsu_All', 'CTX_topics_otsu_No_promoters', 'DEM_topics_otsu_All', 'DEM_topics_otsu_No_promoters', 'CTX_topics_top_3_All', 'CTX_topics_top_3_No_promoters', 'DEM_topics_top_3_All', 'DEM_topics_top_3_No_promoters', 'CTX_DARs_All', 'CTX_DARs_No_promoters', 'DEM_DARs_All', 'DEM_DARs_No_promoters' dr_cell:'GEX_X_pca', 'GEX_X_umap' ``` I'm running the function in an HPC, locally. My code is this: ``` run_scenicplus( scplus_obj = scplus_obj, variable = ['GEX_celltype'], species = 'hsapiens', assembly = 'hg38', tf_file = '/path/TF_names_v_1.01.txt', save_path = os.path.join('/path_to_object/scplus_obj2.pkl'), #biomart_host = biomart_host, upstream = [1000, 150000], downstream = [1000, 150000], calculate_TF_eGRN_correlation = True, calculate_DEGs_DARs = True, export_to_loom_file = True, export_to_UCSC_file = False, path_bedToBigBed = '/path_to_Big_Bed', n_cpu = 16, _temp_dir = '/path_to_tmp_dir') ``` I already have all the outputs: ``` 2023-12-18 11:27:59,744 R2G INFO Took 1223.3278999328613 seconds 2023-12-18 11:27:59,747 R2G INFO Calculating region to gene correlation, using SR method 2023-12-18 11:38:46,592 R2G INFO Took 646.8447403907776 seconds 2023-12-18 11:39:04,790 R2G INFO Done! 2023-12-18 11:39:05,064 SCENIC+_wrapper INFO Inferring TF to gene relationships 2023-12-18 11:39:14,411 TF2G INFO Calculating TF to gene correlation, using GBM method 2023-12-18 14:18:31,312 TF2G INFO Took 9556.900450229645 seconds 2023-12-18 14:18:31,332 TF2G INFO Adding correlation coefficients to adjacencies. 2023-12-18 14:19:35,223 TF2G INFO Warning: adding TFs as their own target to adjecencies matrix. Importance values will be max + 1e-05 2023-12-18 14:19:42,697 TF2G INFO Adding importance x rho scores to adjacencies. 2023-12-18 14:19:42,720 TF2G INFO Took 71.38793706893921 seconds 2023-12-18 14:19:43,015 SCENIC+_wrapper INFO Build eGRN 2023-12-18 14:19:43,016 GSEA INFO Thresholding region to gene relationships 2023-12-18 14:35:57,730 GSEA INFO Subsetting TF2G adjacencies for TF with motif. 2023-12-18 14:36:05,289 GSEA INFO Running GSEA... ``` But then I get this error: ``` ^[[36m(_ray_run_gsea_for_e_module pid=27911)^[[0m norm_tag = 1.0/sum_correl_tag ^[[36m(_ray_run_gsea_for_e_module pid=27911)^[[0m /storage/projects/uvic24/scenicplus_env/lib/python3.8/site-packages/gseapy/algorithm.py:74: RuntimeWarning: invalid value encountered in multiply ^[[36m(_ray_run_gsea_for_e_module pid=27911)^[[0m RES = np.cumsum(tag_indicator * correl_vector * norm_tag - no_tag_indicator * norm_no_tag, axis=axis) ``` ``` initializing: 1%| | 225/22065 [00:40<1:09:56, 5.20it/s]^Minitializing: 1%| | 226/22065 [00:40<1:12:12, 5.04it/s]^Minitializing: 1%| | 227/22065 [00:40<1:15:40, 4.81it/s]^Minitializing: 1%| | 228/22065 [00:40<1:14:12, 4.90it/s]^Minitializing: 1%| | 229/22065 [00:40<1:13:35, 4.94it/s]^Minitializing: 1%| | 230/22065 [00:41<1:11:12, 5.11it/s]^Minitializing: 1%| | 231/22065 [00:41<1:06:44, 5.45it/s]^Minitializing: 1%| | 232/22065 [00:41<1:05:32, 5.55it/s]^Minitializing: 1%| | 233/22065 [00:41<1:05:08, 5.59it/s]^Minitializing: 1%| | 234/22065 [00:41<1:05:42, 5.54it/s]^Minitializing: 1%| | 235/22065 [00:41<1:05:07, 5.59it/s]^Minitializing: 1%| | 236/22065 [00:42<1:05:40, 5.54it/s]^Minitializing: 1%| | 237/22065 ``` And then I wait for hours for the GSEA step to finish and it doesn't finish. Has anybody got the same error, or know how to solve it? Thanks in advance.
SeppeDeWinter commented 11 months ago

Hi @rogercasalsfr

See my answer on: https://github.com/aertslab/scenicplus/discussions/273

All the best, Seppe