aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
165 stars 27 forks source link

Run GSEA hanging for hours #274

Open rogercasalsfr opened 6 months ago

rogercasalsfr commented 6 months ago

Discussed in https://github.com/aertslab/scenicplus/discussions/273

Originally posted by **rogercasalsfr** December 18, 2023 Hi everyone, I am struggling to get the last step running scenicplus which is when is GSEA running. My scplus_obj is: ```SCENIC+ object with n_cells x n_genes = 7830 x 22787 and n_cells x n_regions = 7830 x 133660 metadata_regions:'Chromosome', 'Start', 'End', 'Width', 'cisTopic_nr_frag', 'cisTopic_log_nr_frag', 'cisTopic_nr_acc', 'cisTopic_log_nr_acc' metadata_cell:'GEX_obs_id', 'GEX_cell_type', 'GEX_donor_id', 'GEX_louvain', 'ACC_cisTopic_nr_frag', 'ACC_cisTopic_log_nr_frag', 'ACC_cisTopic_nr_acc', 'ACC_cisTopic_log_nr_acc', 'ACC_sample_id', 'ACC_cell_type' menr:'CTX_topics_otsu_All', 'CTX_topics_otsu_No_promoters', 'DEM_topics_otsu_All', 'DEM_topics_otsu_No_promoters', 'CTX_topics_top_3_All', 'CTX_topics_top_3_No_promoters', 'DEM_topics_top_3_All', 'DEM_topics_top_3_No_promoters', 'CTX_DARs_All', 'CTX_DARs_No_promoters', 'DEM_DARs_All', 'DEM_DARs_No_promoters' dr_cell:'GEX_X_pca', 'GEX_X_umap' ``` I'm running the function in an HPC, locally. My code is this: ``` run_scenicplus( scplus_obj = scplus_obj, variable = ['GEX_celltype'], species = 'hsapiens', assembly = 'hg38', tf_file = '/path/TF_names_v_1.01.txt', save_path = os.path.join('/path_to_object/scplus_obj2.pkl'), #biomart_host = biomart_host, upstream = [1000, 150000], downstream = [1000, 150000], calculate_TF_eGRN_correlation = True, calculate_DEGs_DARs = True, export_to_loom_file = True, export_to_UCSC_file = False, path_bedToBigBed = '/path_to_Big_Bed', n_cpu = 16, _temp_dir = '/path_to_tmp_dir') ``` I already have all the outputs: ``` 2023-12-18 11:27:59,744 R2G INFO Took 1223.3278999328613 seconds 2023-12-18 11:27:59,747 R2G INFO Calculating region to gene correlation, using SR method 2023-12-18 11:38:46,592 R2G INFO Took 646.8447403907776 seconds 2023-12-18 11:39:04,790 R2G INFO Done! 2023-12-18 11:39:05,064 SCENIC+_wrapper INFO Inferring TF to gene relationships 2023-12-18 11:39:14,411 TF2G INFO Calculating TF to gene correlation, using GBM method 2023-12-18 14:18:31,312 TF2G INFO Took 9556.900450229645 seconds 2023-12-18 14:18:31,332 TF2G INFO Adding correlation coefficients to adjacencies. 2023-12-18 14:19:35,223 TF2G INFO Warning: adding TFs as their own target to adjecencies matrix. Importance values will be max + 1e-05 2023-12-18 14:19:42,697 TF2G INFO Adding importance x rho scores to adjacencies. 2023-12-18 14:19:42,720 TF2G INFO Took 71.38793706893921 seconds 2023-12-18 14:19:43,015 SCENIC+_wrapper INFO Build eGRN 2023-12-18 14:19:43,016 GSEA INFO Thresholding region to gene relationships 2023-12-18 14:35:57,730 GSEA INFO Subsetting TF2G adjacencies for TF with motif. 2023-12-18 14:36:05,289 GSEA INFO Running GSEA... ``` But then I get this error: ``` ^[[36m(_ray_run_gsea_for_e_module pid=27911)^[[0m norm_tag = 1.0/sum_correl_tag ^[[36m(_ray_run_gsea_for_e_module pid=27911)^[[0m /storage/projects/uvic24/scenicplus_env/lib/python3.8/site-packages/gseapy/algorithm.py:74: RuntimeWarning: invalid value encountered in multiply ^[[36m(_ray_run_gsea_for_e_module pid=27911)^[[0m RES = np.cumsum(tag_indicator * correl_vector * norm_tag - no_tag_indicator * norm_no_tag, axis=axis) ``` ``` initializing: 1%| | 225/22065 [00:40<1:09:56, 5.20it/s]^Minitializing: 1%| | 226/22065 [00:40<1:12:12, 5.04it/s]^Minitializing: 1%| | 227/22065 [00:40<1:15:40, 4.81it/s]^Minitializing: 1%| | 228/22065 [00:40<1:14:12, 4.90it/s]^Minitializing: 1%| | 229/22065 [00:40<1:13:35, 4.94it/s]^Minitializing: 1%| | 230/22065 [00:41<1:11:12, 5.11it/s]^Minitializing: 1%| | 231/22065 [00:41<1:06:44, 5.45it/s]^Minitializing: 1%| | 232/22065 [00:41<1:05:32, 5.55it/s]^Minitializing: 1%| | 233/22065 [00:41<1:05:08, 5.59it/s]^Minitializing: 1%| | 234/22065 [00:41<1:05:42, 5.54it/s]^Minitializing: 1%| | 235/22065 [00:41<1:05:07, 5.59it/s]^Minitializing: 1%| | 236/22065 [00:42<1:05:40, 5.54it/s]^Minitializing: 1%| | 237/22065 ``` And then I wait for hours for the GSEA step to finish and it doesn't finish. Has anybody got the same error, or know how to solve it? Thanks in advance.
SeppeDeWinter commented 6 months ago

Hi @rogercasalsfr

See my answer on: https://github.com/aertslab/scenicplus/discussions/273

All the best, Seppe