aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
167 stars 27 forks source link

issue running cistarget using human dataset #87

Open sid5427 opened 1 year ago

sid5427 commented 1 year ago

Hi Seppe and other devs - Happy new year!

I am unfortunately facing another issue while running scenic with our human 10x multiome data.

When I run pycistarget it's throwing an error - "ValueError: A gene signature must have at least one gene."

Python version - 3.8.13 Scenic version - (not sure .. Updated to latest version on december 30th - version returns - AttributeError: module 'scenicplus' has no attribute 'version' <- might want to check this as well.)

This is how I am setting up pycistarget to run ...

rankings_db = 'data/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather'
scores_db =  'data/hg38_screen_v10_clust.regions_vs_motifs.scores.feather'
motif_annotation = 'data/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl' 

##create paths for enriched motifs
if not os.path.exists('results/motifs'):
    os.makedirs('results/motifs')

from scenicplus.wrappers.run_pycistarget import run_pycistarget
run_pycistarget(
    region_sets = region_sets,
    species = 'homo_sapiens',
    save_path = 'results/motifs',
    ctx_db_path = rankings_db,
    dem_db_path = scores_db,
    path_to_motif_annotations = motif_annotation,
    #run_without_promoters = True,
    n_cpu = 8,
    _temp_dir = '/users/sen2qb/symlinks/temp_d_d/ray_spill',
    annotation_version = 'v10nr_clust',
    )

output log -

2023-01-09 16:50:26,509 pycisTarget_wrapper INFO     results/motifs folder already exists.
2023-01-09 16:50:28,061 pycisTarget_wrapper INFO     Loading cisTarget database for topics_otsu
2023-01-09 16:50:28,063 cisTarget    INFO     Reading cisTarget database
2023-01-09 16:58:16,508 pycisTarget_wrapper INFO     Running cisTarget for topics_otsu

2023-01-09 16:58:35,199 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

(ctx_internal_ray pid=135215) 2023-01-09 16:58:57,165 cisTarget    INFO     Running cisTarget for Topic1 which has 2683 regions
(ctx_internal_ray pid=135214) 2023-01-09 16:58:57,254 cisTarget    INFO     Running cisTarget for Topic2 which has 7472 regions
(ctx_internal_ray pid=135216) 2023-01-09 16:58:58,224 cisTarget    INFO     Running cisTarget for Topic3 which has 7003 regions
(ctx_internal_ray pid=135212) 2023-01-09 16:58:59,248 cisTarget    INFO     Running cisTarget for Topic4 which has 8008 regions
(ctx_internal_ray pid=135219) 2023-01-09 16:58:59,772 cisTarget    INFO     Running cisTarget for Topic5 which has 4396 regions
(ctx_internal_ray pid=135218) 2023-01-09 16:59:00,092 cisTarget    INFO     Running cisTarget for Topic6 which has 690 regions
(ctx_internal_ray pid=135217) 2023-01-09 16:59:00,544 cisTarget    INFO     Running cisTarget for Topic7 which has 871 regions
(ctx_internal_ray pid=135213) 2023-01-09 16:59:00,939 cisTarget    INFO     Running cisTarget for Topic8 which has 1500 regions
(ctx_internal_ray pid=135214) 2023-01-09 16:59:20,625 cisTarget    INFO     Annotating motifs for Topic2
(ctx_internal_ray pid=135214) 2023-01-09 16:59:23,420 cisTarget    INFO     Getting cistromes for Topic2
(ctx_internal_ray pid=135214) 2023-01-09 16:59:24,558 cisTarget    INFO     Running cisTarget for Topic9 which has 2254 regions
(ctx_internal_ray pid=135218) 2023-01-09 16:59:25,580 cisTarget    INFO     Annotating motifs for Topic6
(ctx_internal_ray pid=135219) 2023-01-09 16:59:26,070 cisTarget    INFO     Annotating motifs for Topic5
(ctx_internal_ray pid=135218) 2023-01-09 16:59:27,338 cisTarget    INFO     Getting cistromes for Topic6
(ctx_internal_ray pid=135218) 2023-01-09 16:59:27,664 cisTarget    INFO     Running cisTarget for Topic10 which has 2709 regions
(ctx_internal_ray pid=135219) 2023-01-09 16:59:28,210 cisTarget    INFO     Getting cistromes for Topic5
(ctx_internal_ray pid=135216) 2023-01-09 16:59:28,323 cisTarget    INFO     Annotating motifs for Topic3
(ctx_internal_ray pid=135215) 2023-01-09 16:59:28,390 cisTarget    INFO     Annotating motifs for Topic1
(ctx_internal_ray pid=135217) 2023-01-09 16:59:28,591 cisTarget    INFO     Annotating motifs for Topic7
(ctx_internal_ray pid=135219) 2023-01-09 16:59:29,130 cisTarget    INFO     Running cisTarget for Topic11 which has 3502 regions
(ctx_internal_ray pid=135215) 2023-01-09 16:59:30,299 cisTarget    INFO     Getting cistromes for Topic1
(ctx_internal_ray pid=135217) 2023-01-09 16:59:30,303 cisTarget    INFO     Getting cistromes for Topic7
(ctx_internal_ray pid=135216) 2023-01-09 16:59:30,495 cisTarget    INFO     Getting cistromes for Topic3
(ctx_internal_ray pid=135217) 2023-01-09 16:59:30,505 cisTarget    INFO     Running cisTarget for Topic12 which has 6396 regions
(ctx_internal_ray pid=135215) 2023-01-09 16:59:31,016 cisTarget    INFO     Running cisTarget for Topic13 which has 1821 regions
(ctx_internal_ray pid=135216) 2023-01-09 16:59:31,509 cisTarget    INFO     Running cisTarget for Topic14 which has 6555 regions
(ctx_internal_ray pid=135213) 2023-01-09 16:59:34,557 cisTarget    INFO     Annotating motifs for Topic8
(ctx_internal_ray pid=135212) 2023-01-09 16:59:36,161 cisTarget    INFO     Annotating motifs for Topic4
(ctx_internal_ray pid=135213) 2023-01-09 16:59:36,496 cisTarget    INFO     Getting cistromes for Topic8
(ctx_internal_ray pid=135213) 2023-01-09 16:59:37,284 cisTarget    INFO     Running cisTarget for Topic15 which has 2558 regions
(ctx_internal_ray pid=135212) 2023-01-09 16:59:38,378 cisTarget    INFO     Getting cistromes for Topic4
(ctx_internal_ray pid=135212) 2023-01-09 16:59:39,515 cisTarget    INFO     Running cisTarget for Topic16 which has 4249 regions
(ctx_internal_ray pid=135214) 2023-01-09 16:59:49,186 cisTarget    INFO     Annotating motifs for Topic9
(ctx_internal_ray pid=135214) 2023-01-09 16:59:51,090 cisTarget    INFO     Getting cistromes for Topic9
(ctx_internal_ray pid=135214) 2023-01-09 16:59:51,626 cisTarget    INFO     Running cisTarget for Topic17 which has 3641 regions
(ctx_internal_ray pid=135218) 2023-01-09 16:59:52,462 cisTarget    INFO     Annotating motifs for Topic10
(ctx_internal_ray pid=135218) 2023-01-09 16:59:54,468 cisTarget    INFO     Getting cistromes for Topic10
(ctx_internal_ray pid=135218) 2023-01-09 16:59:55,062 cisTarget    INFO     Running cisTarget for Topic18 which has 3115 regions
(ctx_internal_ray pid=135215) 2023-01-09 16:59:58,649 cisTarget    INFO     Annotating motifs for Topic13
(ctx_internal_ray pid=135219) 2023-01-09 16:59:59,027 cisTarget    INFO     Annotating motifs for Topic11
(ctx_internal_ray pid=135215) 2023-01-09 17:00:00,584 cisTarget    INFO     Getting cistromes for Topic13
(ctx_internal_ray pid=135219) 2023-01-09 17:00:00,979 cisTarget    INFO     Getting cistromes for Topic11
(ctx_internal_ray pid=135215) 2023-01-09 17:00:01,249 cisTarget    INFO     Running cisTarget for Topic19 which has 4491 regions
(ctx_internal_ray pid=135217) 2023-01-09 17:00:01,349 cisTarget    INFO     Annotating motifs for Topic12
(ctx_internal_ray pid=135219) 2023-01-09 17:00:01,656 cisTarget    INFO     Running cisTarget for Topic20 which has 5425 regions
(ctx_internal_ray pid=135217) 2023-01-09 17:00:03,554 cisTarget    INFO     Getting cistromes for Topic12
(ctx_internal_ray pid=135216) 2023-01-09 17:00:03,891 cisTarget    INFO     Annotating motifs for Topic14
(ctx_internal_ray pid=135217) 2023-01-09 17:00:04,672 cisTarget    INFO     Running cisTarget for Topic21 which has 2658 regions
(ctx_internal_ray pid=135213) 2023-01-09 17:00:05,961 cisTarget    INFO     Annotating motifs for Topic15
(ctx_internal_ray pid=135216) 2023-01-09 17:00:06,166 cisTarget    INFO     Getting cistromes for Topic14
(ctx_internal_ray pid=135216) 2023-01-09 17:00:07,369 cisTarget    INFO     Running cisTarget for Topic22 which has 4600 regions
(ctx_internal_ray pid=135212) 2023-01-09 17:00:07,806 cisTarget    INFO     Annotating motifs for Topic16
(ctx_internal_ray pid=135213) 2023-01-09 17:00:07,982 cisTarget    INFO     Getting cistromes for Topic15
(ctx_internal_ray pid=135213) 2023-01-09 17:00:08,768 cisTarget    INFO     Running cisTarget for Topic23 which has 3878 regions
(ctx_internal_ray pid=135212) 2023-01-09 17:00:09,889 cisTarget    INFO     Getting cistromes for Topic16
(ctx_internal_ray pid=135212) 2023-01-09 17:00:10,822 cisTarget    INFO     Running cisTarget for Topic24 which has 6484 regions
(ctx_internal_ray pid=135214) 2023-01-09 17:00:20,196 cisTarget    INFO     Annotating motifs for Topic17
(ctx_internal_ray pid=135218) 2023-01-09 17:00:21,711 cisTarget    INFO     Annotating motifs for Topic18
(ctx_internal_ray pid=135214) 2023-01-09 17:00:22,376 cisTarget    INFO     Getting cistromes for Topic17
(ctx_internal_ray pid=135214) 2023-01-09 17:00:23,530 cisTarget    INFO     Running cisTarget for Topic25 which has 3604 regions
(ctx_internal_ray pid=135218) 2023-01-09 17:00:24,000 cisTarget    INFO     Getting cistromes for Topic18
(ctx_internal_ray pid=135218) 2023-01-09 17:00:25,434 cisTarget    INFO     Running cisTarget for Topic26 which has 6203 regions
(ctx_internal_ray pid=135219) 2023-01-09 17:00:29,809 cisTarget    INFO     Annotating motifs for Topic20
(ctx_internal_ray pid=135215) 2023-01-09 17:00:30,175 cisTarget    INFO     Annotating motifs for Topic19
(ctx_internal_ray pid=135219) 2023-01-09 17:00:32,346 cisTarget    INFO     Getting cistromes for Topic20
(ctx_internal_ray pid=135215) 2023-01-09 17:00:32,483 cisTarget    INFO     Getting cistromes for Topic19
(ctx_internal_ray pid=135217) 2023-01-09 17:00:32,618 cisTarget    INFO     Annotating motifs for Topic21
(ctx_internal_ray pid=135217) 2023-01-09 17:00:34,519 cisTarget    INFO     Getting cistromes for Topic21
(ctx_internal_ray pid=135216) 2023-01-09 17:00:35,464 cisTarget    INFO     Annotating motifs for Topic22
(ctx_internal_ray pid=135216) 2023-01-09 17:00:37,579 cisTarget    INFO     Getting cistromes for Topic22
(ctx_internal_ray pid=135213) 2023-01-09 17:00:38,689 cisTarget    INFO     Annotating motifs for Topic23
(ctx_internal_ray pid=135213) 2023-01-09 17:00:40,970 cisTarget    INFO     Getting cistromes for Topic23
(ctx_internal_ray pid=135212) 2023-01-09 17:00:41,362 cisTarget    INFO     Annotating motifs for Topic24
(ctx_internal_ray pid=135212) 2023-01-09 17:00:43,648 cisTarget    INFO     Getting cistromes for Topic24
(ctx_internal_ray pid=135214) 2023-01-09 17:00:47,327 cisTarget    INFO     Annotating motifs for Topic25
(ctx_internal_ray pid=135218) 2023-01-09 17:00:47,488 cisTarget    INFO     Annotating motifs for Topic26
(ctx_internal_ray pid=135214) 2023-01-09 17:00:49,153 cisTarget    INFO     Getting cistromes for Topic25
(ctx_internal_ray pid=135218) 2023-01-09 17:00:49,796 cisTarget    INFO     Getting cistromes for Topic26
2023-01-09 17:00:56,898 cisTarget    INFO     Done!
2023-01-09 17:00:56,903 pycisTarget_wrapper INFO     Created folder : results/motifs/CTX_topics_otsu_All
2023-01-09 17:00:57,533 pycisTarget_wrapper INFO     Running DEM for topics_otsu
2023-01-09 17:00:57,535 DEM          INFO     Reading DEM database
2023-01-09 17:05:54,352 DEM          INFO     Creating contrast groups

2023-01-09 17:06:23,981 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

(DEM_internal_ray pid=136708) 2023-01-09 17:06:50,842 DEM          INFO     Computing DEM for Topic1
(DEM_internal_ray pid=136712) 2023-01-09 17:06:52,131 DEM          INFO     Computing DEM for Topic2
(DEM_internal_ray pid=136713) 2023-01-09 17:06:52,434 DEM          INFO     Computing DEM for Topic6
(DEM_internal_ray pid=136711) 2023-01-09 17:06:52,647 DEM          INFO     Computing DEM for Topic7
(DEM_internal_ray pid=136710) 2023-01-09 17:06:52,784 DEM          INFO     Computing DEM for Topic3
(DEM_internal_ray pid=136709) 2023-01-09 17:06:52,917 DEM          INFO     Computing DEM for Topic8
(DEM_internal_ray pid=136714) 2023-01-09 17:06:53,228 DEM          INFO     Computing DEM for Topic5
(DEM_internal_ray pid=136707) 2023-01-09 17:06:53,503 DEM          INFO     Computing DEM for Topic4
(DEM_internal_ray pid=136708) 2023-01-09 17:06:57,700 DEM          INFO     Computing DEM for Topic9
(DEM_internal_ray pid=136710) 2023-01-09 17:06:58,965 DEM          INFO     Computing DEM for Topic10
(DEM_internal_ray pid=136707) 2023-01-09 17:07:00,126 DEM          INFO     Computing DEM for Topic11
(DEM_internal_ray pid=136709) 2023-01-09 17:07:00,839 DEM          INFO     Computing DEM for Topic13
(DEM_internal_ray pid=136712) 2023-01-09 17:07:01,429 DEM          INFO     Computing DEM for Topic12
(DEM_internal_ray pid=136714) 2023-01-09 17:07:02,329 DEM          INFO     Computing DEM for Topic14
(DEM_internal_ray pid=136713) 2023-01-09 17:07:03,336 DEM          INFO     Computing DEM for Topic15
(DEM_internal_ray pid=136711) 2023-01-09 17:07:04,646 DEM          INFO     Computing DEM for Topic16
(DEM_internal_ray pid=136707) 2023-01-09 17:07:06,136 DEM          INFO     Computing DEM for Topic17
(DEM_internal_ray pid=136712) 2023-01-09 17:07:07,656 DEM          INFO     Computing DEM for Topic18
(DEM_internal_ray pid=136714) 2023-01-09 17:07:09,169 DEM          INFO     Computing DEM for Topic19
(DEM_internal_ray pid=136708) 2023-01-09 17:07:15,762 DEM          INFO     Computing DEM for Topic20
(DEM_internal_ray pid=136707) 2023-01-09 17:07:15,929 DEM          INFO     Computing DEM for Topic21
(DEM_internal_ray pid=136709) 2023-01-09 17:07:18,049 DEM          INFO     Computing DEM for Topic22
(DEM_internal_ray pid=136711) 2023-01-09 17:07:21,557 DEM          INFO     Computing DEM for Topic23
(DEM_internal_ray pid=136714) 2023-01-09 17:07:23,525 DEM          INFO     Computing DEM for Topic24
(DEM_internal_ray pid=136709) 2023-01-09 17:07:24,388 DEM          INFO     Computing DEM for Topic25
(DEM_internal_ray pid=136707) 2023-01-09 17:07:26,658 DEM          INFO     Computing DEM for Topic26
2023-01-09 17:07:49,100 DEM          INFO     Forming cistromes
2023-01-09 17:07:59,043 DEM          INFO     Done!
2023-01-09 17:08:04,669 pycisTarget_wrapper INFO     Created folder : results/motifs/DEM_topics_otsu_All
2023-01-09 17:08:05,487 pycisTarget_wrapper INFO     Loading cisTarget database for topics_top_3
2023-01-09 17:08:05,488 cisTarget    INFO     Reading cisTarget database
2023-01-09 17:11:17,749 pycisTarget_wrapper INFO     Running cisTarget for topics_top_3

2023-01-09 17:11:27,812 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

(ctx_internal_ray pid=138080) 2023-01-09 17:11:50,897 cisTarget    INFO     Running cisTarget for Topic1 which has 3269 regions
(ctx_internal_ray pid=138077) 2023-01-09 17:11:51,040 cisTarget    INFO     Running cisTarget for Topic2 which has 3595 regions
(ctx_internal_ray pid=138078) 2023-01-09 17:11:52,108 cisTarget    INFO     Running cisTarget for Topic3 which has 3678 regions
(ctx_internal_ray pid=138079) 2023-01-09 17:11:52,669 cisTarget    INFO     Running cisTarget for Topic4 which has 3790 regions
(ctx_internal_ray pid=138074) 2023-01-09 17:11:53,292 cisTarget    INFO     Running cisTarget for Topic5 which has 3428 regions
(ctx_internal_ray pid=138075) 2023-01-09 17:11:53,748 cisTarget    INFO     Running cisTarget for Topic6 which has 3543 regions
(ctx_internal_ray pid=138073) 2023-01-09 17:11:54,196 cisTarget    INFO     Running cisTarget for Topic7 which has 3630 regions
(ctx_internal_ray pid=138076) 2023-01-09 17:11:54,286 cisTarget    INFO     Running cisTarget for Topic8 which has 3832 regions
(ctx_internal_ray pid=138077) 2023-01-09 17:12:12,492 cisTarget    INFO     Annotating motifs for Topic2
(ctx_internal_ray pid=138077) 2023-01-09 17:12:14,462 cisTarget    INFO     Getting cistromes for Topic2
(ctx_internal_ray pid=138077) 2023-01-09 17:12:15,317 cisTarget    INFO     Running cisTarget for Topic9 which has 3372 regions
(ctx_internal_ray pid=138075) 2023-01-09 17:12:17,210 cisTarget    INFO     Annotating motifs for Topic6
(ctx_internal_ray pid=138080) 2023-01-09 17:12:18,413 cisTarget    INFO     Annotating motifs for Topic1
(ctx_internal_ray pid=138075) 2023-01-09 17:12:19,137 cisTarget    INFO     Getting cistromes for Topic6
(ctx_internal_ray pid=138074) 2023-01-09 17:12:19,815 cisTarget    INFO     Annotating motifs for Topic5
(ctx_internal_ray pid=138075) 2023-01-09 17:12:19,728 cisTarget    INFO     Running cisTarget for Topic10 which has 3470 regions
(ctx_internal_ray pid=138080) 2023-01-09 17:12:20,193 cisTarget    INFO     Getting cistromes for Topic1
(ctx_internal_ray pid=138080) 2023-01-09 17:12:20,920 cisTarget    INFO     Running cisTarget for Topic11 which has 3553 regions
(ctx_internal_ray pid=138074) 2023-01-09 17:12:21,751 cisTarget    INFO     Getting cistromes for Topic5
(ctx_internal_ray pid=138073) 2023-01-09 17:12:21,883 cisTarget    INFO     Annotating motifs for Topic7
(ctx_internal_ray pid=138074) 2023-01-09 17:12:22,414 cisTarget    INFO     Running cisTarget for Topic12 which has 3981 regions
(ctx_internal_ray pid=138079) 2023-01-09 17:12:22,490 cisTarget    INFO     Annotating motifs for Topic4
(ctx_internal_ray pid=138076) 2023-01-09 17:12:23,682 cisTarget    INFO     Annotating motifs for Topic8
(ctx_internal_ray pid=138078) 2023-01-09 17:12:23,775 cisTarget    INFO     Annotating motifs for Topic3
(ctx_internal_ray pid=138073) 2023-01-09 17:12:23,737 cisTarget    INFO     Getting cistromes for Topic7
(ctx_internal_ray pid=138073) 2023-01-09 17:12:24,313 cisTarget    INFO     Running cisTarget for Topic13 which has 3447 regions
(ctx_internal_ray pid=138079) 2023-01-09 17:12:24,425 cisTarget    INFO     Getting cistromes for Topic4
(ctx_internal_ray pid=138079) 2023-01-09 17:12:25,120 cisTarget    INFO     Running cisTarget for Topic14 which has 3970 regions
(ctx_internal_ray pid=138076) 2023-01-09 17:12:25,614 cisTarget    INFO     Getting cistromes for Topic8
(ctx_internal_ray pid=138078) 2023-01-09 17:12:25,722 cisTarget    INFO     Getting cistromes for Topic3
(ctx_internal_ray pid=138078) 2023-01-09 17:12:26,402 cisTarget    INFO     Running cisTarget for Topic15 which has 3276 regions
(ctx_internal_ray pid=138076) 2023-01-09 17:12:26,663 cisTarget    INFO     Running cisTarget for Topic16 which has 3415 regions
(ctx_internal_ray pid=138077) 2023-01-09 17:12:30,842 cisTarget    INFO     Annotating motifs for Topic9
(ctx_internal_ray pid=138077) 2023-01-09 17:12:32,808 cisTarget    INFO     Getting cistromes for Topic9
(ctx_internal_ray pid=138077) 2023-01-09 17:12:33,499 cisTarget    INFO     Running cisTarget for Topic17 which has 3731 regions
(ctx_internal_ray pid=138080) 2023-01-09 17:12:40,586 cisTarget    INFO     Annotating motifs for Topic11
(ctx_internal_ray pid=138080) 2023-01-09 17:12:42,343 cisTarget    INFO     Getting cistromes for Topic11
(ctx_internal_ray pid=138080) 2023-01-09 17:12:42,915 cisTarget    INFO     Running cisTarget for Topic18 which has 3838 regions
(ctx_internal_ray pid=138075) 2023-01-09 17:12:43,710 cisTarget    INFO     Annotating motifs for Topic10
(ctx_internal_ray pid=138075) 2023-01-09 17:12:45,779 cisTarget    INFO     Getting cistromes for Topic10
(ctx_internal_ray pid=138075) 2023-01-09 17:12:46,448 cisTarget    INFO     Running cisTarget for Topic19 which has 3817 regions
(ctx_internal_ray pid=138076) 2023-01-09 17:12:47,486 cisTarget    INFO     Annotating motifs for Topic16
(ctx_internal_ray pid=138076) 2023-01-09 17:12:49,396 cisTarget    INFO     Getting cistromes for Topic16
(ctx_internal_ray pid=138076) 2023-01-09 17:12:50,159 cisTarget    INFO     Running cisTarget for Topic20 which has 4003 regions
(ctx_internal_ray pid=138073) 2023-01-09 17:12:50,711 cisTarget    INFO     Annotating motifs for Topic13
(ctx_internal_ray pid=138074) 2023-01-09 17:12:51,042 cisTarget    INFO     Annotating motifs for Topic12
(ctx_internal_ray pid=138073) 2023-01-09 17:12:52,745 cisTarget    INFO     Getting cistromes for Topic13
(ctx_internal_ray pid=138079) 2023-01-09 17:12:52,949 cisTarget    INFO     Annotating motifs for Topic14
(ctx_internal_ray pid=138074) 2023-01-09 17:12:53,047 cisTarget    INFO     Getting cistromes for Topic12
(ctx_internal_ray pid=138073) 2023-01-09 17:12:53,570 cisTarget    INFO     Running cisTarget for Topic21 which has 3309 regions
(ctx_internal_ray pid=138074) 2023-01-09 17:12:53,893 cisTarget    INFO     Running cisTarget for Topic22 which has 3774 regions
(ctx_internal_ray pid=138079) 2023-01-09 17:12:54,977 cisTarget    INFO     Getting cistromes for Topic14
(ctx_internal_ray pid=138079) 2023-01-09 17:12:55,752 cisTarget    INFO     Running cisTarget for Topic23 which has 3997 regions
(ctx_internal_ray pid=138078) 2023-01-09 17:12:56,224 cisTarget    INFO     Annotating motifs for Topic15
(ctx_internal_ray pid=138078) 2023-01-09 17:12:58,306 cisTarget    INFO     Getting cistromes for Topic15
(ctx_internal_ray pid=138078) 2023-01-09 17:12:59,182 cisTarget    INFO     Running cisTarget for Topic24 which has 3870 regions
(ctx_internal_ray pid=138077) 2023-01-09 17:12:59,841 cisTarget    INFO     Annotating motifs for Topic17
(ctx_internal_ray pid=138077) 2023-01-09 17:13:02,071 cisTarget    INFO     Getting cistromes for Topic17
(ctx_internal_ray pid=138077) 2023-01-09 17:13:03,306 cisTarget    INFO     Running cisTarget for Topic25 which has 3465 regions
(ctx_internal_ray pid=138080) 2023-01-09 17:13:04,487 cisTarget    INFO     Annotating motifs for Topic18
(ctx_internal_ray pid=138080) 2023-01-09 17:13:06,924 cisTarget    INFO     Getting cistromes for Topic18
(ctx_internal_ray pid=138075) 2023-01-09 17:13:07,678 cisTarget    INFO     Annotating motifs for Topic19
(ctx_internal_ray pid=138080) 2023-01-09 17:13:08,587 cisTarget    INFO     Running cisTarget for Topic26 which has 3928 regions
(ctx_internal_ray pid=138075) 2023-01-09 17:13:09,932 cisTarget    INFO     Getting cistromes for Topic19
(ctx_internal_ray pid=138076) 2023-01-09 17:13:12,149 cisTarget    INFO     Annotating motifs for Topic20
(ctx_internal_ray pid=138076) 2023-01-09 17:13:14,262 cisTarget    INFO     Getting cistromes for Topic20
(ctx_internal_ray pid=138074) 2023-01-09 17:13:16,666 cisTarget    INFO     Annotating motifs for Topic22
(ctx_internal_ray pid=138073) 2023-01-09 17:13:16,911 cisTarget    INFO     Annotating motifs for Topic21
(ctx_internal_ray pid=138074) 2023-01-09 17:13:18,713 cisTarget    INFO     Getting cistromes for Topic22
(ctx_internal_ray pid=138073) 2023-01-09 17:13:18,931 cisTarget    INFO     Getting cistromes for Topic21
(ctx_internal_ray pid=138079) 2023-01-09 17:13:21,074 cisTarget    INFO     Annotating motifs for Topic23
(ctx_internal_ray pid=138079) 2023-01-09 17:13:23,286 cisTarget    INFO     Getting cistromes for Topic23
(ctx_internal_ray pid=138077) 2023-01-09 17:13:23,652 cisTarget    INFO     Annotating motifs for Topic25
(ctx_internal_ray pid=138078) 2023-01-09 17:13:24,470 cisTarget    INFO     Annotating motifs for Topic24
(ctx_internal_ray pid=138077) 2023-01-09 17:13:25,550 cisTarget    INFO     Getting cistromes for Topic25
(ctx_internal_ray pid=138078) 2023-01-09 17:13:26,409 cisTarget    INFO     Getting cistromes for Topic24
(ctx_internal_ray pid=138080) 2023-01-09 17:13:29,032 cisTarget    INFO     Annotating motifs for Topic26
(ctx_internal_ray pid=138080) 2023-01-09 17:13:31,048 cisTarget    INFO     Getting cistromes for Topic26
2023-01-09 17:13:36,700 cisTarget    INFO     Done!
2023-01-09 17:13:36,706 pycisTarget_wrapper INFO     Created folder : results/motifs/CTX_topics_top_3_All
2023-01-09 17:13:37,328 pycisTarget_wrapper INFO     Running DEM for topics_top_3
2023-01-09 17:13:37,330 DEM          INFO     Reading DEM database
2023-01-09 17:16:24,839 DEM          INFO     Creating contrast groups

2023-01-09 17:16:55,424 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 

(DEM_internal_ray pid=139390) 2023-01-09 17:17:21,269 DEM          INFO     Computing DEM for Topic1
(DEM_internal_ray pid=139389) 2023-01-09 17:17:21,557 DEM          INFO     Computing DEM for Topic2
(DEM_internal_ray pid=139388) 2023-01-09 17:17:22,012 DEM          INFO     Computing DEM for Topic3
(DEM_internal_ray pid=139387) 2023-01-09 17:17:22,777 DEM          INFO     Computing DEM for Topic4
(DEM_internal_ray pid=139391) 2023-01-09 17:17:22,720 DEM          INFO     Computing DEM for Topic5
(DEM_internal_ray pid=139384) 2023-01-09 17:17:22,802 DEM          INFO     Computing DEM for Topic6
(DEM_internal_ray pid=139386) 2023-01-09 17:17:23,193 DEM          INFO     Computing DEM for Topic7
(DEM_internal_ray pid=139385) 2023-01-09 17:17:23,242 DEM          INFO     Computing DEM for Topic8
(DEM_internal_ray pid=139388) 2023-01-09 17:17:27,334 DEM          INFO     Computing DEM for Topic9
(DEM_internal_ray pid=139387) 2023-01-09 17:17:28,507 DEM          INFO     Computing DEM for Topic10
(DEM_internal_ray pid=139390) 2023-01-09 17:17:28,694 DEM          INFO     Computing DEM for Topic11
(DEM_internal_ray pid=139385) 2023-01-09 17:17:29,732 DEM          INFO     Computing DEM for Topic12
(DEM_internal_ray pid=139391) 2023-01-09 17:17:31,412 DEM          INFO     Computing DEM for Topic13
(DEM_internal_ray pid=139384) 2023-01-09 17:17:31,561 DEM          INFO     Computing DEM for Topic14
(DEM_internal_ray pid=139389) 2023-01-09 17:17:32,458 DEM          INFO     Computing DEM for Topic15
(DEM_internal_ray pid=139390) 2023-01-09 17:17:34,263 DEM          INFO     Computing DEM for Topic16
(DEM_internal_ray pid=139386) 2023-01-09 17:17:34,882 DEM          INFO     Computing DEM for Topic17
(DEM_internal_ray pid=139385) 2023-01-09 17:17:35,623 DEM          INFO     Computing DEM for Topic18
(DEM_internal_ray pid=139384) 2023-01-09 17:17:37,134 DEM          INFO     Computing DEM for Topic19
(DEM_internal_ray pid=139387) 2023-01-09 17:17:45,269 DEM          INFO     Computing DEM for Topic20
(DEM_internal_ray pid=139386) 2023-01-09 17:17:45,445 DEM          INFO     Computing DEM for Topic21
(DEM_internal_ray pid=139384) 2023-01-09 17:17:46,820 DEM          INFO     Computing DEM for Topic22
(DEM_internal_ray pid=139390) 2023-01-09 17:17:48,670 DEM          INFO     Computing DEM for Topic23
(DEM_internal_ray pid=139391) 2023-01-09 17:17:49,197 DEM          INFO     Computing DEM for Topic24
(DEM_internal_ray pid=139388) 2023-01-09 17:17:49,289 DEM          INFO     Computing DEM for Topic25
(DEM_internal_ray pid=139384) 2023-01-09 17:17:53,272 DEM          INFO     Computing DEM for Topic26
2023-01-09 17:18:11,647 DEM          INFO     Forming cistromes
2023-01-09 17:18:18,833 DEM          INFO     Done!
2023-01-09 17:18:23,679 pycisTarget_wrapper INFO     Created folder : results/motifs/DEM_topics_top_3_All
2023-01-09 17:18:24,360 pycisTarget_wrapper INFO     Loading cisTarget database for DARs
2023-01-09 17:18:24,362 cisTarget    INFO     Reading cisTarget database

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[77], line 2
      1 from scenicplus.wrappers.run_pycistarget import run_pycistarget
----> 2 run_pycistarget(
      3     region_sets = region_sets,
      4     species = 'homo_sapiens',
      5     save_path = 'results/motifs',
      6     ctx_db_path = rankings_db,
      7     dem_db_path = scores_db,
      8     path_to_motif_annotations = motif_annotation,
      9     #run_without_promoters = True,
     10     n_cpu = 8,
     11     _temp_dir = '/users/sen2qb/symlinks/temp_d_d/ray_spill',
     12     annotation_version = 'v10nr_clust',
     13     )

File ~/testing_area/scenicplus/src/scenicplus/wrappers/run_pycistarget.py:182, in run_pycistarget(region_sets, species, save_path, custom_annot, save_partial, ctx_db_path, dem_db_path, run_without_promoters, biomart_host, promoter_space, ctx_auc_threshold, ctx_nes_threshold, ctx_rank_threshold, dem_log2fc_thr, dem_motif_hit_thr, dem_max_bg_regions, annotation, motif_similarity_fdr, path_to_motif_annotations, annotation_version, n_cpu, _temp_dir, exclude_motifs, exclude_collection, **kwargs)
    180 ## CISTARGET
    181 regions = region_sets[key]
--> 182 ctx_db = cisTargetDatabase(ctx_db_path, regions)  
    183 if exclude_motifs is not None:
    184     out = pd.read_csv(exclude_motifs, header=None).iloc[:,0].tolist()

File ~/testing_area/pycistarget/pycistarget/motif_enrichment_cistarget.py:67, in cisTargetDatabase.__init__(self, fname, region_sets, name, fraction_overlap)
     48 def __init__(self, 
     49             fname: str,
     50             region_sets: Union[Dict[str, pr.PyRanges], pr.PyRanges] = None,
     51             name: str = None,
     52             fraction_overlap: float = 0.4):
     53     """
     54     Initialize cisTargetDatabase
     55     
   (...)
     65         Minimal overlap between query and regions in the database for the mapping.     
     66     """
---> 67     self.regions_to_db, self.db_rankings, self.total_regions = self.load_db(fname,
     68                                                       region_sets,
     69                                                       name,
     70                                                       fraction_overlap)

File ~/testing_area/pycistarget/pycistarget/motif_enrichment_cistarget.py:131, in cisTargetDatabase.load_db(self, fname, region_sets, name, fraction_overlap)
    129 if prefix is not None:
    130     target_regions_in_db = [prefix + '__' + x for x in target_regions_in_db]
--> 131 target_regions_in_db = GeneSignature(name=name, gene2weight=target_regions_in_db)
    132 db_rankings = db.load(target_regions_in_db)
    133 if prefix is not None:

File <attrs generated init ctxcore.genesig.GeneSignature>:8, in __init__(self, name, gene2weight)
      6 if _config._run_validators is True:
      7     __attr_validator_name(self, __attr_name, self.name)
----> 8     __attr_validator_gene2weight(self, __attr_gene2weight, self.gene2weight)

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/ctxcore/genesig.py:172, in GeneSignature.gene2weight_validator(self, attribute, value)
    169 @gene2weight.validator
    170 def gene2weight_validator(self, attribute, value) -> None:
    171     if len(value) == 0:
--> 172         raise ValueError("A gene signature must have at least one gene.")

ValueError: A gene signature must have at least one gene.

I looked at other error reports- namely - https://github.com/aertslab/scenicplus/issues/60

and tried the same command with "save partial = TRUE" and "run without promoters - TRUE"

I get the same error - error log

Would it be possible to create the scenic object and run the scenic+ function with a couple of the partial result pickle files e.g. CTX_topics_otsu_All.pkl instead of menr.pkl?

Thanks! Sid.

SeppeDeWinter commented 1 year ago

Hi @sid5427

From the error I suspect that region_sets['DARs'] might be empty or contain empty entries. Could you show the output of region_sets['DARs'] to confirm this?

On your question wether it is possible to run SCENIC+ with a couple of the partial result. This is possible, you can generate the menr dictionary like this (in your case):


import dill
CTX_topics_otsu_All = dill.load(open('results/motifs/CTX_topics_otsu_All.pkl', 'rb'))
DEM_topics_otsu_All = dill.load(open('results/motifs/DEM_topics_otsu_All.pkl', 'rb'))
CTX_topics_top_3_All = dill.load(open('results/motifs/CTX_topics_top_3_All.pkl', 'rb'))
DEM_topics_top_3_All = dill.load(open('results/motifs/DEM_topics_top_3_All.pkl', 'rb'))

menr['CTX_topics_otsu_All'] = CTX_topics_otsu_All
menr['DEM_topics_otsu_All'] = DEM_topics_otsu_All
menr['CTX_topics_top_3_All'] = CTX_topics_top_3_All
menr['DEM_topics_top_3_All'] = DEM_topics_top_3_All

Best,

Seppe

sid5427 commented 1 year ago

Hi Seppe,

That's the weird part - when I run the code section for finding DARs in markers_dict

for DAR in markers_dict.keys():
    regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    #print(regions)
    region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))
    print("pr.PyRanges(region_names_to_coordinates(regions))")

I get this error -

pr.PyRanges(region_names_to_coordinates(regions))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 4
      2 regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
      3 #print(regions)
----> 4 region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))
      5 print("pr.PyRanges(region_names_to_coordinates(regions))")

File ~/testing_area/pycistarget/pycistarget/utils.py:33, in region_names_to_coordinates(region_names)
     31 regiondf=pd.concat([chrom, start, end], axis=1, sort=False)
     32 regiondf.index=[i for i in region_names if ':' in i]
---> 33 regiondf.columns=['Chromosome', 'Start', 'End']
     34 return(regiondf)

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/generic.py:5915, in NDFrame.__setattr__(self, name, value)
   5913 try:
   5914     object.__getattribute__(self, name)
-> 5915     return object.__setattr__(self, name, value)
   5916 except AttributeError:
   5917     pass

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/_libs/properties.pyx:69, in pandas._libs.properties.AxisProperty.__set__()

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/generic.py:823, in NDFrame._set_axis(self, axis, labels)
    821 def _set_axis(self, axis: int, labels: AnyArrayLike | list) -> None:
    822     labels = ensure_index(labels)
--> 823     self._mgr.set_axis(axis, labels)
    824     self._clear_item_cache()

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/internals/managers.py:227, in BaseBlockManager.set_axis(self, axis, new_labels)
    225 def set_axis(self, axis: int, new_labels: Index) -> None:
    226     # Caller is responsible for ensuring we have an Index object.
--> 227     self._validate_set_axis(axis, new_labels)
    228     self.axes[axis] = new_labels

File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/internals/base.py:70, in DataManager._validate_set_axis(self, axis, new_labels)
     67     pass
     69 elif new_len != old_len:
---> 70     raise ValueError(
     71         f"Length mismatch: Expected axis has {old_len} elements, new "
     72         f"values have {new_len} elements"
     73     )

ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements

However if I run region_sets['DARs'] after that - I get this -

{'BMCP': +--------------+-----------+-----------+
 | Chromosome   | Start     | End       |
 | (category)   | (int32)   | (int32)   |
 |--------------+-----------+-----------|
 | chr1         | 21353367  | 21353867  |
 | chr1         | 27542533  | 27543033  |
 | chr1         | 147377812 | 147378312 |
 | chr1         | 186195347 | 186195847 |
 | ...          | ...       | ...       |
 | chrX         | 130179564 | 130180064 |
 | chrX         | 129957183 | 129957683 |
 | chrX         | 109848273 | 109848773 |
 | chrX         | 41257752  | 41258252  |
 +--------------+-----------+-----------+
 Unstranded PyRanges object has 3,635 rows and 3 columns from 23 chromosomes.
 For printing, the PyRanges was sorted on Chromosome.}

I went ahead and printed the output of print(markers_dict) and this what I get - looks like scenic does not detect markers for certain cell types (i.e. the result from markers_dict = find_diff_features(cistopic_obj, imputed_acc_obj, variable='celltype', var_features=variable_regions, split_pattern = '-') <-- this complete successfully though...)

{'BMCP':                             Log2FC Adjusted_pval Contrast
chr8:73520503-73521003    4.247113           0.0     BMCP
chr1:21353367-21353867    4.242884           0.0     BMCP
chr11:44780868-44781368   4.219286           0.0     BMCP
chr13:44397846-44398346   4.167223           0.0     BMCP
chr1:27542533-27543033    4.164805           0.0     BMCP
...                            ...           ...      ...
chr22:38768855-38769355   0.586214           0.0     BMCP
chr7:15977629-15978129    0.585935           0.0     BMCP
chr5:88884545-88885045    0.585763           0.0     BMCP
chr5:150129881-150130381  0.585325           0.0     BMCP
chr3:195853922-195854422   0.58524           0.0     BMCP

[3636 rows x 3 columns], 'CD14-Mono': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [], 'CD34 Gran-ATAC':                             Log2FC Adjusted_pval        Contrast
chr12:2294663-2295163     4.291807           0.0  CD34 Gran-ATAC
chrX:15911664-15912164     4.25781           0.0  CD34 Gran-ATAC
chr3:128523503-128524003  4.191673           0.0  CD34 Gran-ATAC
chr9:129058106-129058606  4.185681           0.0  CD34 Gran-ATAC
chr13:28468022-28468522   4.104304           0.0  CD34 Gran-ATAC
...                            ...           ...             ...
chr9:127967948-127968448  0.585811      0.000347  CD34 Gran-ATAC
chrX:56813483-56813983    0.585558      0.000001  CD34 Gran-ATAC
chr14:35286835-35287335   0.585539      0.000006  CD34 Gran-ATAC
chr11:72787723-72788223   0.585364      0.000001  CD34 Gran-ATAC
chr2:20424749-20425249     0.58513           0.0  CD34 Gran-ATAC

[3357 rows x 3 columns], 'CLP':                              Log2FC Adjusted_pval Contrast
chr2:231672428-231672928   2.806754      0.001634      CLP
chr16:29339242-29339742     2.58527       0.00002      CLP
chr7:2723132-2723632        2.58264       0.00002      CLP
chr22:22935612-22936112    2.513794       0.00002      CLP
chr12:110647657-110648157  2.494059       0.00002      CLP
...                             ...           ...      ...
chr19:47336760-47337260     0.58605       0.01005      CLP
chr2:101474036-101474536    0.58595      0.000616      CLP
chr11:104653394-104653894  0.585924      0.022442      CLP
chr12:89663417-89663917    0.585352      0.045297      CLP
chr6:150142812-150143312   0.585031      0.000074      CLP

[4583 rows x 3 columns], 'ERP':                             Log2FC Adjusted_pval Contrast
chr1:21353367-21353867    4.939022           0.0      ERP
chr8:73520503-73521003     4.93843           0.0      ERP
chr11:44780868-44781368    4.91324           0.0      ERP
chr13:44397846-44398346   4.868945           0.0      ERP
chr1:27542533-27543033    4.864803           0.0      ERP
...                            ...           ...      ...
chr1:179586214-179586714  0.586495           0.0      ERP
chr17:36482888-36483388   0.586426           0.0      ERP
chr12:79882960-79883460   0.586234      0.000066      ERP
chr21:34201754-34202254   0.586081           0.0      ERP
chr7:138400431-138400931  0.585121      0.000039      ERP

[3377 rows x 3 columns], 'HSC CACNB2':                             Log2FC Adjusted_pval    Contrast
chr17:13347389-13347889   0.914718           0.0  HSC CACNB2
chr3:152977353-152977853  0.900724           0.0  HSC CACNB2
chrX:72269788-72270288    0.897633           0.0  HSC CACNB2
chr5:119274871-119275371  0.897493           0.0  HSC CACNB2
chr4:155543343-155543843  0.897219           0.0  HSC CACNB2
...                            ...           ...         ...
chr15:34855325-34855825   0.585576           0.0  HSC CACNB2
chr15:52483673-52484173   0.585511           0.0  HSC CACNB2
chr10:71776295-71776795   0.585307           0.0  HSC CACNB2
chr2:15640184-15640684    0.585199           0.0  HSC CACNB2
chr21:45364168-45364668   0.585058           0.0  HSC CACNB2

[658 rows x 3 columns], 'HSC HIST1H2AC':                             Log2FC Adjusted_pval       Contrast
chrX:13470339-13470839    1.188423           0.0  HSC HIST1H2AC
chr2:108288396-108288896   1.17873           0.0  HSC HIST1H2AC
chrX:98398402-98398902    1.174786           0.0  HSC HIST1H2AC
chr5:45507129-45507629    1.171706           0.0  HSC HIST1H2AC
chr6:170506315-170506815  1.168462           0.0  HSC HIST1H2AC
...                            ...           ...            ...
chr8:28347974-28348474    0.585566           0.0  HSC HIST1H2AC
chr12:60144359-60144859   0.585439           0.0  HSC HIST1H2AC
chr8:144292302-144292802  0.585366           0.0  HSC HIST1H2AC
chr17:61252140-61252640    0.58524           0.0  HSC HIST1H2AC
chr14:92586155-92586655   0.585017           0.0  HSC HIST1H2AC

[2866 rows x 3 columns], 'HSC MYADM-CD97':                             Log2FC Adjusted_pval        Contrast
chr17:13347389-13347889   1.627802           0.0  HSC MYADM-CD97
chrX:72269788-72270288    1.593157           0.0  HSC MYADM-CD97
chr1:52139793-52140293    1.588745           0.0  HSC MYADM-CD97
chr1:207282009-207282509  1.573321           0.0  HSC MYADM-CD97
chr12:81668404-81668904   1.564987           0.0  HSC MYADM-CD97
...                            ...           ...             ...
chr9:121351196-121351696  0.585478           0.0  HSC MYADM-CD97
chr18:70781827-70782327   0.585219           0.0  HSC MYADM-CD97
chr2:20727942-20728442    0.585111           0.0  HSC MYADM-CD97
chr12:52259754-52260254   0.585101           0.0  HSC MYADM-CD97
chr17:63676658-63677158   0.585022           0.0  HSC MYADM-CD97

[2861 rows x 3 columns], 'HSC WNT11':                             Log2FC Adjusted_pval   Contrast
chr20:34583552-34584052   0.897686           0.0  HSC WNT11
chr1:157465990-157466490  0.857838           0.0  HSC WNT11
chr17:69703839-69704339   0.854879           0.0  HSC WNT11
chr16:86415293-86415793   0.836379           0.0  HSC WNT11
chr9:99833712-99834212    0.836237           0.0  HSC WNT11
...                            ...           ...        ...
chr5:45507129-45507629    0.585922           0.0  HSC WNT11
chr18:11652338-11652838   0.585842           0.0  HSC WNT11
chr2:46866827-46867327    0.585681           0.0  HSC WNT11
chr2:219182404-219182904  0.585369           0.0  HSC WNT11
chrX:112889319-112889819  0.585211           0.0  HSC WNT11

[232 rows x 3 columns], 'LMPP CDK6-FLT3':                              Log2FC Adjusted_pval        Contrast
chr17:60402734-60403234    1.689114           0.0  LMPP CDK6-FLT3
chr5:157867743-157868243   1.673037           0.0  LMPP CDK6-FLT3
chr6:119523493-119523993   1.670789           0.0  LMPP CDK6-FLT3
chr3:139202264-139202764   1.666878           0.0  LMPP CDK6-FLT3
chr5:98922016-98922516     1.653112           0.0  LMPP CDK6-FLT3
...                             ...           ...             ...
chr3:46926890-46927390     0.585351           0.0  LMPP CDK6-FLT3
chr19:41378266-41378766    0.585168           0.0  LMPP CDK6-FLT3
chr8:101314062-101314562    0.58502           0.0  LMPP CDK6-FLT3
chr22:29079289-29079789    0.584999           0.0  LMPP CDK6-FLT3
chr11:123484430-123484930  0.584964           0.0  LMPP CDK6-FLT3

[4573 rows x 3 columns], 'LMPP LSAMP':                             Log2FC Adjusted_pval    Contrast
chr19:28388949-28389449   2.962269           0.0  LMPP LSAMP
chr2:108776418-108776918  2.960341           0.0  LMPP LSAMP
chr5:158825148-158825648  2.957966           0.0  LMPP LSAMP
chr10:33715201-33715701   2.956146           0.0  LMPP LSAMP
chr3:29030843-29031343    2.953672           0.0  LMPP LSAMP
...                            ...           ...         ...
chr13:41916484-41916984   0.585829           0.0  LMPP LSAMP
chr3:122271735-122272235  0.585776           0.0  LMPP LSAMP
chr20:19943058-19943558   0.585691           0.0  LMPP LSAMP
chr11:44611700-44612200   0.585415           0.0  LMPP LSAMP
chr15:38817713-38818213    0.58509           0.0  LMPP LSAMP

[5472 rows x 3 columns], 'LMPP Naive T-cell':                             Log2FC Adjusted_pval           Contrast
chr2:231672428-231672928  5.046681           0.0  LMPP Naive T-cell
chr17:57552218-57552718   4.440933           0.0  LMPP Naive T-cell
chr2:234164455-234164955  4.180235           0.0  LMPP Naive T-cell
chr22:44025612-44026112   4.119872           0.0  LMPP Naive T-cell
chr11:65639492-65639992    4.04134           0.0  LMPP Naive T-cell
...                            ...           ...                ...
chr6:24666804-24667304    0.586347           0.0  LMPP Naive T-cell
chr22:48098172-48098672   0.586282           0.0  LMPP Naive T-cell
chr7:111408651-111409151  0.585472      0.000576  LMPP Naive T-cell
chr9:99129969-99130469    0.585327      0.000062  LMPP Naive T-cell
chr2:127829811-127830311  0.584968      0.000556  LMPP Naive T-cell

[1764 rows x 3 columns], 'LMPP PRSS1':                             Log2FC Adjusted_pval    Contrast
chr10:33715201-33715701   2.414107           0.0  LMPP PRSS1
chr3:29030843-29031343    2.412636           0.0  LMPP PRSS1
chr21:38525418-38525918   2.412636           0.0  LMPP PRSS1
chr19:28388949-28389449   2.411385           0.0  LMPP PRSS1
chr20:53686072-53686572   2.410341           0.0  LMPP PRSS1
...                            ...           ...         ...
chr1:212596123-212596623  0.585835           0.0  LMPP PRSS1
chr19:19451403-19451903   0.585529           0.0  LMPP PRSS1
chr1:43836421-43836921    0.585337           0.0  LMPP PRSS1
chr6:31351814-31352314    0.585109           0.0  LMPP PRSS1
chr3:69092014-69092514    0.585054           0.0  LMPP PRSS1

[5788 rows x 3 columns], 'LT-HSC HLF':                              Log2FC Adjusted_pval    Contrast
chr5:45507129-45507629     1.531193           0.0  LT-HSC HLF
chr6:170506315-170506815   1.528928           0.0  LT-HSC HLF
chr10:13133034-13133534    1.509826           0.0  LT-HSC HLF
chr22:37613498-37613998    1.505232           0.0  LT-HSC HLF
chr12:103540329-103540829  1.502772           0.0  LT-HSC HLF
...                             ...           ...         ...
chr14:24313086-24313586    0.585542           0.0  LT-HSC HLF
chr20:18589905-18590405    0.585467           0.0  LT-HSC HLF
chr17:75864253-75864753    0.585419           0.0  LT-HSC HLF
chr12:66776806-66777306    0.585236           0.0  LT-HSC HLF
chr8:109745381-109745881   0.585025           0.0  LT-HSC HLF

[3537 rows x 3 columns], 'MDP-2 GPR133':                              Log2FC Adjusted_pval      Contrast
chr2:231672428-231672928   2.663041       0.00004  MDP-2 GPR133
chr16:29339242-29339742    2.645864      0.000032  MDP-2 GPR133
chr7:2723132-2723632       2.619371      0.000032  MDP-2 GPR133
chr12:110647657-110648157  2.560181      0.000032  MDP-2 GPR133
chr22:22935612-22936112    2.539039      0.000032  MDP-2 GPR133
...                             ...           ...           ...
chr19:3849137-3849637      0.585687      0.023213  MDP-2 GPR133
chr10:74058584-74059084    0.585513      0.000168  MDP-2 GPR133
chr3:184492115-184492615   0.585447      0.001824  MDP-2 GPR133
chrX:114268580-114269080   0.585185      0.002259  MDP-2 GPR133
chr7:139811046-139811546   0.585112      0.000575  MDP-2 GPR133

[4048 rows x 3 columns], 'MDP-pDC':                              Log2FC Adjusted_pval Contrast
chr2:231672428-231672928   5.797586           0.0  MDP-pDC
chr17:57552218-57552718    5.176988           0.0  MDP-pDC
chr2:234164455-234164955    4.96483           0.0  MDP-pDC
chr22:44025612-44026112    4.882341           0.0  MDP-pDC
chr11:65639492-65639992    4.872473           0.0  MDP-pDC
...                             ...           ...      ...
chr19:3324341-3324841      0.585875      0.000001  MDP-pDC
chr8:38830617-38831117      0.58583      0.000004  MDP-pDC
chr13:98484715-98485215    0.585555      0.006703  MDP-pDC
chr12:120437801-120438301  0.585127      0.000002  MDP-pDC
chr12:132558281-132558781  0.585089      0.000273  MDP-pDC

[4951 rows x 3 columns], 'MEP-MKP':                            Log2FC Adjusted_pval Contrast
chr8:73520503-73521003    3.73497           0.0  MEP-MKP
chr1:21353367-21353867    3.70901           0.0  MEP-MKP
chr11:44780868-44781368   3.69003           0.0  MEP-MKP
chr1:27542533-27543033   3.619486           0.0  MEP-MKP
chr13:44397846-44398346  3.613465           0.0  MEP-MKP
...                           ...           ...      ...
chr1:31162607-31163107   0.586068           0.0  MEP-MKP
chr7:94395369-94395869   0.585895           0.0  MEP-MKP
chr4:74134146-74134646   0.585226           0.0  MEP-MKP
chr17:29117081-29117581  0.585171           0.0  MEP-MKP
chr19:41367776-41368276  0.584975           0.0  MEP-MKP

[3786 rows x 3 columns], 'ML-Gran':                             Log2FC Adjusted_pval Contrast
chr2:239814585-239815085  1.545085           0.0  ML-Gran
chr7:2214382-2214882       1.53738           0.0  ML-Gran
chr10:11901863-11902363    1.43475           0.0  ML-Gran
chr6:5162341-5162841      1.428097           0.0  ML-Gran
chr4:6888842-6889342      1.397363           0.0  ML-Gran
...                            ...           ...      ...
chr22:35602121-35602621   0.585847           0.0  ML-Gran
chr11:93718133-93718633    0.58579           0.0  ML-Gran
chr16:84913249-84913749   0.585337           0.0  ML-Gran
chr11:1152253-1152753     0.585231           0.0  ML-Gran
chr2:88858054-88858554    0.585031           0.0  ML-Gran

[1035 rows x 3 columns], 'MPP Ribo-high': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [], 'MPP SPINK2-CD99': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [], 'MultiLin-ATAC':                             Log2FC Adjusted_pval       Contrast
chr12:2294663-2295163     1.908098           0.0  MultiLin-ATAC
chrX:15911664-15912164    1.876974           0.0  MultiLin-ATAC
chr3:128523503-128524003  1.864965           0.0  MultiLin-ATAC
chr9:129058106-129058606  1.818677           0.0  MultiLin-ATAC
chr14:59361084-59361584   1.807945           0.0  MultiLin-ATAC
...                            ...           ...            ...
chr8:100577336-100577836  0.585536           0.0  MultiLin-ATAC
chr17:82334744-82335244   0.585499           0.0  MultiLin-ATAC
chr10:30049091-30049591   0.585449           0.0  MultiLin-ATAC
chr15:89820641-89821141    0.58526           0.0  MultiLin-ATAC
chr16:68284770-68285270   0.585148           0.0  MultiLin-ATAC

[2195 rows x 3 columns], 'ST-HSC PBX1':                             Log2FC Adjusted_pval     Contrast
chr16:59048819-59049319    0.69066           0.0  ST-HSC PBX1
chr1:100628676-100629176  0.689165           0.0  ST-HSC PBX1
chr2:195047452-195047952  0.686393           0.0  ST-HSC PBX1
chr9:3825541-3826041      0.686267           0.0  ST-HSC PBX1
chr15:35595622-35596122   0.685384           0.0  ST-HSC PBX1
...                            ...           ...          ...
chr4:21041593-21042093    0.585442           0.0  ST-HSC PBX1
chr1:209925957-209926457  0.585398           0.0  ST-HSC PBX1
chr1:169465253-169465753  0.585347           0.0  ST-HSC PBX1
chr13:98454535-98455035   0.585183           0.0  ST-HSC PBX1
chr18:36168255-36168755   0.585096           0.0  ST-HSC PBX1

[1022 rows x 3 columns], 'pre-Gran CP':                             Log2FC Adjusted_pval     Contrast
chr12:2294663-2295163      3.18217           0.0  pre-Gran CP
chrX:15911664-15912164    3.137117           0.0  pre-Gran CP
chr3:128523503-128524003   3.08506           0.0  pre-Gran CP
chr9:129058106-129058606  3.076241           0.0  pre-Gran CP
chr13:28468022-28468522   3.014072           0.0  pre-Gran CP
...                            ...           ...          ...
chr6:117547619-117548119  0.585609           0.0  pre-Gran CP
chr4:146243417-146243917  0.585519           0.0  pre-Gran CP
chr3:50626386-50626886    0.585345           0.0  pre-Gran CP
chr18:73768053-73768553   0.585049           0.0  pre-Gran CP
chr6:10603219-10603719    0.584966           0.0  pre-Gran CP

[3673 rows x 3 columns], 'pre-MEP':                             Log2FC Adjusted_pval Contrast
chr10:71980251-71980751   1.818789           0.0  pre-MEP
chr10:12328693-12329193   1.813141           0.0  pre-MEP
chr3:189890471-189890971  1.809711           0.0  pre-MEP
chr14:29650519-29651019   1.807064           0.0  pre-MEP
chr9:591201-591701        1.798752           0.0  pre-MEP
...                            ...           ...      ...
chr8:84626097-84626597     0.58563           0.0  pre-MEP
chr2:126368715-126369215   0.58541           0.0  pre-MEP
chr6:87723209-87723709    0.585265           0.0  pre-MEP
chr16:19130706-19131206   0.585088           0.0  pre-MEP
chr11:32056360-32056860   0.585063           0.0  pre-MEP

[3227 rows x 3 columns], 'pre-PC':                             Log2FC Adjusted_pval Contrast
chr2:231672428-231672928  6.236564           0.0   pre-PC
chr17:57552218-57552718   5.575458           0.0   pre-PC
chr2:234164455-234164955  5.337587           0.0   pre-PC
chr22:44025612-44026112   5.195669           0.0   pre-PC
chr11:65639492-65639992   5.120273           0.0   pre-PC
...                            ...           ...      ...
chr12:11650900-11651400   0.588236      0.000293   pre-PC
chr13:30114859-30115359   0.586593           0.0   pre-PC
chr9:129459108-129459608  0.586559           0.0   pre-PC
chr1:92485186-92485686    0.586446           0.0   pre-PC
chr19:41530845-41531345   0.586325           0.0   pre-PC

[2235 rows x 3 columns]}
SeppeDeWinter commented 1 year ago

Hi @sid5427

Yes indeed, it's these empty dataframes in markers_dict that is causing the error (i.e. 'CD14-Mono', 'MPP Ribo-high' and 'MPP SPINK2-CD99').

You should remove this prior to running:


for DAR in markers_dict.keys():
    regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    #print(regions)
    region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))

You can also do it like this


for DAR in markers_dict.keys():
    regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    if len(regions) > 0:
        region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))

The reason that these dataframes are empty is because no regions passed the thresholds (i.e. log 2 Fold Change of 1.5 and adjusted p value < 0.05, by default). You can also change these thresholds in find_diff_features function to get more regions.

Best,

Seppe

sid5427 commented 1 year ago

Hi Seppe,

Thanks for the solution - I'll incorporate that into my run. I had tried this to remove the three troublesome clusters -

##remove clusters CD14-Mono, MPP Ribo-high, MPP SPINK2-CD99
adata_filtered = adata[adata.obs['cell_type'] != 'MPP Ribo-high' ] #MPP Ribo-high
adata_filtered = adata_filtered[adata_filtered.obs['cell_type'] != 'CD14-Mono' ] #CD14-Mono
adata_filtered = adata_filtered[adata_filtered.obs['cell_type'] != 'MPP SPINK2-CD99' ] #MPP SPINK2-CD99
adata_filtered.obs.cell_type
adata = adata_filtered ##replace original adata with filtered one
del(adata_filtered)

This did work, and it generated a scenicplus object with some of the downstream figures. However I get an error later for this part -

from scenicplus.cistromes import TF_cistrome_correlation, generate_pseudobulks

generate_pseudobulks(
        scplus_obj = scplus_obj,
        variable = 'GEX_cell_type',
        auc_key = 'eRegulon_AUC_filtered',
        signature_key = 'Gene_based')
generate_pseudobulks(
        scplus_obj = scplus_obj,
        variable = 'GEX_cell_type',
        auc_key = 'eRegulon_AUC_filtered',
        signature_key = 'Region_based')

TF_cistrome_correlation(
            scplus_obj,
            use_pseudobulk = True,
            variable = 'GEX_cell_type',
            auc_key = 'eRegulon_AUC_filtered',
            signature_key = 'Gene_based',
            out_key = 'filtered_gene_based')
TF_cistrome_correlation(
            scplus_obj,
            use_pseudobulk = True,
            variable = 'GEX_cell_type',
            auc_key = 'eRegulon_AUC_filtered',
            signature_key = 'Region_based',
            out_key = 'filtered_region_based')

and this is the error -

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[19], line 3
      1 from scenicplus.cistromes import TF_cistrome_correlation, generate_pseudobulks
----> 3 generate_pseudobulks(
      4         scplus_obj = scplus_obj,
      5         variable = 'GEX_cell_type',
      6         auc_key = 'eRegulon_AUC_filtered',
      7         signature_key = 'Gene_based')
      8 generate_pseudobulks(
      9         scplus_obj = scplus_obj,
     10         variable = 'GEX_cell_type',
     11         auc_key = 'eRegulon_AUC_filtered',
     12         signature_key = 'Region_based')
     14 TF_cistrome_correlation(
     15             scplus_obj,
     16             use_pseudobulk = True,
   (...)
     19             signature_key = 'Gene_based',
     20             out_key = 'filtered_gene_based')

File ~/testing_area/scenicplus/src/scenicplus/cistromes.py:227, in generate_pseudobulks(scplus_obj, variable, normalize_expression, auc_key, signature_key, nr_cells, nr_pseudobulks, seed)
    225 for x in range(nr_pseudobulks):
    226     random.seed(x)
--> 227     sample_cells = sample(cells, nr_cells)
    228     sub_dgem = dgem.loc[sample_cells, :].mean(axis=0)
    229     sub_auc = cistromes_auc.loc[sample_cells, :].mean(axis=0)

File ~/.conda/envs/py_3_8/lib/python3.8/random.py:363, in Random.sample(self, population, k)
    361 n = len(population)
    362 if not 0 <= k <= n:
--> 363     raise ValueError("Sample larger than population or is negative")
    364 result = [None] * k
    365 setsize = 21        # size of a small set minus size of an empty list

ValueError: Sample larger than population or is negative

Is this related to my ad-hoc solution? Will using the code snippet you provided solve this error downstream?

Appreciate the help! Sid

SeppeDeWinter commented 1 year ago

Hi @sid5427

This is a known "bug" that is caused by the fact that you have an annotation (GEX_celltype) with less than 5 cells.

However the fact that you're at this step means that SCENIC+ has indeed worked successfully. You can skip this optional step for now by setting calculate_TF_eGRN_correlation to False. I will fix this bug a soon as I have some time.

Best,

Seppe

SeppeDeWinter commented 1 year ago

Hi,

you can use this https://github.com/aertslab/scenicplus/commit/6b4bdad3a7761904168702ba9b8c0c395b3afa45 function instead. It does not require generating pseudobulks beforehand.

Best,

Seppe

RosaDeSa commented 10 months ago

Same problem. I don't have menr.pkl and DEM_*_topics.pkl after running run_pycistarget. I have only CTX files. What could be the problem @SeppeDeWinter ?

SeppeDeWinter commented 10 months ago

@RosaDeSa

Did you have any error messages after running run_pycistarget? If not, you can try running using a single core, this might reveal some error message that was not passed properly.

Best,

Seppe

RosaDeSa commented 10 months ago

Thanks @SeppeDeWinter using a single core, it worked!

SeppeDeWinter commented 10 months ago

You did not see any error messages using a single core?

Best,

Seppe

RosaDeSa commented 10 months ago

Oddly, it worked without errors and gave me in output of all the files using a single core. Best, Rosa

CYorick commented 8 months ago

Hi @sid5427

From the error I suspect that region_sets['DARs'] might be empty or contain empty entries. Could you show the output of region_sets['DARs'] to confirm this?

On your question wether it is possible to run SCENIC+ with a couple of the partial result. This is possible, you can generate the menr dictionary like this (in your case):

import dill
CTX_topics_otsu_All = dill.load(open('results/motifs/CTX_topics_otsu_All.pkl', 'rb'))
DEM_topics_otsu_All = dill.load(open('results/motifs/DEM_topics_otsu_All.pkl', 'rb'))
CTX_topics_top_3_All = dill.load(open('results/motifs/CTX_topics_top_3_All.pkl', 'rb'))
DEM_topics_top_3_All = dill.load(open('results/motifs/DEM_topics_top_3_All.pkl', 'rb'))

menr['CTX_topics_otsu_All'] = CTX_topics_otsu_All
menr['DEM_topics_otsu_All'] = DEM_topics_otsu_All
menr['CTX_topics_top_3_All'] = CTX_topics_top_3_All
menr['DEM_topics_top_3_All'] = DEM_topics_top_3_All

Best,

Seppe

Similar problem, my markers_dict is empty, which may be the cause of the death of core while running run_pycistarget. And it did not create CTX_topics_otsu_All.pkl as well as other pkl files. Instead, I only have CTX_topics_otsu_All files, should I combine all the html files and turn into a pkl and then run the above code?