aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
178 stars 28 forks source link

Error running run_pycistarget() #102

Closed sneddonucsf closed 1 year ago

sneddonucsf commented 1 year ago

Hello,

I am trying to run run_pycistarget() with the following:

import pyranges as pr
from pycistarget.utils import region_names_to_coordinates
region_sets = {}
region_sets['topics_otsu'] = {}
region_sets['DARs_Annotation'] = {}
for topic in region_bin_topics_otsu.keys():
    regions = region_bin_topics_otsu[topic].index[region_bin_topics_otsu[topic].index.str.startswith('chr')] #only keep regions on known chromosomes
    region_sets['topics_otsu'][topic] = pr.PyRanges(region_names_to_coordinates(regions))
for DAR in markers_dict_Annotation.keys():
    regions = markers_dict_Annotation[DAR].index[markers_dict_Annotation[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    region_sets['DARs_Annotation'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))
for key in region_sets.keys():
    print(f'{key}: {region_sets[key].keys()}')

rankings_db = '/wynton/home/sneddon/seandelao1991/scenic_proj/input/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather'
scores_db = '/wynton/home/sneddon/seandelao1991/scenic_proj/input/hg38_screen_v10_clust.regions_vs_motifs.scores.feather'
motif_annotation = '/wynton/home/sneddon/seandelao1991/scenic_proj/input/motifs-v10-nr.hgnc-m0.00001-o0.0.tbl'

import os
from scenicplus.wrappers.run_pycistarget import run_pycistarget
sys.stderr = open(os.devnull, "w")  # silence stderr
run_pycistarget(
    region_sets = region_sets,
    species = 'homo_sapiens',
    save_path = os.path.join(outDir, 'motifs'),
    ctx_db_path = rankings_db,
    dem_db_path = scores_db,
    path_to_motif_annotations = motif_annotation,
    run_without_promoters = True,
    n_cpu = 10,
    _temp_dir = os.path.join(tmp_dir, 'ray_spill'),
    annotation_version = 'v10nr_clust')

but get the following error:

2023-02-07 16:00:26,166 pycisTarget_wrapper INFO     /wynton/home/sneddon/seandelao1991/scenic_proj/output/motifs folder already exists.
2023-02-07 16:00:28,722 pycisTarget_wrapper INFO     Loading cisTarget database for topics_otsu
2023-02-07 16:00:28,722 cisTarget    INFO     Reading cisTarget database
Traceback (most recent call last):
  File "scenic+_4.py", line 54, in <module>
    run_pycistarget(
  File "/wynton/home/sneddon/seandelao1991/scenic_plus/lib/python3.8/site-packages/scenicplus/wrappers/run_pycistarget.py", line 182, in run_pycistarget
    ctx_db = cisTargetDatabase(ctx_db_path, regions)  
  File "/wynton/home/sneddon/seandelao1991/scenic_plus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 67, in __init__
    self.regions_to_db, self.db_rankings, self.total_regions = self.load_db(fname,
  File "/wynton/home/sneddon/seandelao1991/scenic_plus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 119, in load_db
    target_to_db_dict = {x: target_to_query(region_sets[x], list(db_regions), fraction_overlap = fraction_overlap) for x in region_sets.keys()}
  File "/wynton/home/sneddon/seandelao1991/scenic_plus/lib/python3.8/site-packages/pycistarget/motif_enrichment_cistarget.py", line 119, in <dictcomp>
    target_to_db_dict = {x: target_to_query(region_sets[x], list(db_regions), fraction_overlap = fraction_overlap) for x in region_sets.keys()}
  File "/wynton/home/sneddon/seandelao1991/scenic_plus/lib/python3.8/site-packages/pycistarget/utils.py", line 286, in target_to_query
    target_regions = [str(chrom) + ":" + str(start) + '-' + str(end) for chrom, start, end in zip(list(join_pr.Chromosome), list(join_pr.Start), list(join_pr.End))]
  File "/wynton/home/sneddon/seandelao1991/scenic_plus/lib/python3.8/site-packages/pyranges/pyranges.py", line 269, in __getattr__
    return _getattr(self, name)
  File "/wynton/home/sneddon/seandelao1991/scenic_plus/lib/python3.8/site-packages/pyranges/methods/attr.py", line 67, in _getattr
    raise AttributeError("PyRanges object has no attribute", name)
AttributeError: ('PyRanges object has no attribute', 'Chromosome')

My objects are as following:

topics_otsu: dict_keys(['Topic1', 'Topic2', 'Topic3', 'Topic4', 'Topic5', 'Topic6', 'Topic7', 'Topic8', 'Topic9', 'Topic10', 'Topic11', 'Topic12', 'Topic13', 'Topic14', 'Topic15', 'Topic16', 'Topic17', 'Topic18', 'Topic19', 'Topic20', 'Topic21', 'Topic22', 'Topic23', 'Topic24', 'Topic25'])
DARs_Annotation: dict_keys(['Alpha', 'Beta', 'EPs', 'Epsilon'])
{'topics_otsu': {'Topic1': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 22418995  | 22418995  |
| chr1         | 233993594 | 233993594 |
| chr1         | 215071183 | 215071183 |
| chr1         | 216447919 | 216447919 |
| ...          | ...       | ...       |
| chrX         | 137362327 | 137362327 |
| chrX         | 102659654 | 102659654 |
| chrX         | 18424493  | 18424493  |
| chrX         | 8726051   | 8726051   |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4,066 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic2': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 213863998 | 213863998 |
| chr1         | 18072216  | 18072216  |
| chr1         | 5895574   | 5895574   |
| chr1         | 95850862  | 95850862  |
| ...          | ...       | ...       |
| chrX         | 153962214 | 153962214 |
| chrX         | 38561069  | 38561069  |
| chrX         | 106693035 | 106693035 |
| chrX         | 49760333  | 49760333  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 8,402 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic3': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 11454003  | 11454003  |
| chr1         | 54291355  | 54291355  |
| chr1         | 63276767  | 63276767  |
| chr1         | 48245959  | 48245959  |
| ...          | ...       | ...       |
| chrX         | 132975161 | 132975161 |
| chrX         | 39897379  | 39897379  |
| chrX         | 24647139  | 24647139  |
| chrX         | 15259030  | 15259030  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 5,098 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic4': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 234633372 | 234633372 |
| chr1         | 56863194  | 56863194  |
| chr1         | 94962807  | 94962807  |
| chr1         | 99409760  | 99409760  |
| ...          | ...       | ...       |
| chrX         | 52989978  | 52989978  |
| chrX         | 101758957 | 101758957 |
| chrX         | 118116908 | 118116908 |
| chrX         | 16882123  | 16882123  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4,480 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic5': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 102500754 | 102500754 |
| chr1         | 235290660 | 235290660 |
| chr1         | 48948056  | 48948056  |
| chr1         | 18723806  | 18723806  |
| ...          | ...       | ...       |
| chrX         | 76172776  | 76172776  |
| chrX         | 15300019  | 15300019  |
| chrX         | 119476013 | 119476013 |
| chrX         | 8083909   | 8083909   |
+--------------+-----------+-----------+
Unstranded PyRanges object has 6,811 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic6': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 17603148  | 17603148  |
| chr1         | 1208330   | 1208330   |
| chr1         | 156852448 | 156852448 |
| chr1         | 32222105  | 32222105  |
| ...          | ...       | ...       |
| chrX         | 153477055 | 153477055 |
| chrX         | 38802979  | 38802979  |
| chrX         | 23332016  | 23332016  |
| chrX         | 74325766  | 74325766  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4,523 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic7': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 23554803  | 23554803  |
| chr1         | 164637989 | 164637989 |
| chr1         | 108876675 | 108876675 |
| chr1         | 20786564  | 20786564  |
| ...          | ...       | ...       |
| chrX         | 135343372 | 135343372 |
| chrX         | 15623046  | 15623046  |
| chrX         | 17024843  | 17024843  |
| chrX         | 104156716 | 104156716 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2,975 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic8': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 149886540 | 149886540 |
| chr1         | 65420638  | 65420638  |
| chr1         | 32222105  | 32222105  |
| chr1         | 204516117 | 204516117 |
| ...          | ...       | ...       |
| chrX         | 150568824 | 150568824 |
| chrX         | 123322546 | 123322546 |
| chrX         | 48597220  | 48597220  |
| chrX         | 52921343  | 52921343  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3,989 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic9': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 147612342 | 147612342 |
| chr1         | 234230250 | 234230250 |
| chr1         | 67139102  | 67139102  |
| chr1         | 57839478  | 57839478  |
| ...          | ...       | ...       |
| chrX         | 49172473  | 49172473  |
| chrX         | 20378141  | 20378141  |
| chrX         | 49171630  | 49171630  |
| chrX         | 36955747  | 36955747  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 5,258 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic10': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 31669451  | 31669451  |
| chr1         | 38213657  | 38213657  |
| chr1         | 182611698 | 182611698 |
| chr1         | 94697762  | 94697762  |
| ...          | ...       | ...       |
| chrX         | 106727122 | 106727122 |
| chrX         | 130339668 | 130339668 |
| chrX         | 109497669 | 109497669 |
| chrX         | 153780671 | 153780671 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4,891 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic11': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 36385821  | 36385821  |
| chr1         | 41241846  | 41241846  |
| chr1         | 228103141 | 228103141 |
| chr1         | 181022646 | 181022646 |
| ...          | ...       | ...       |
| chrX         | 153875631 | 153875631 |
| chrX         | 153032528 | 153032528 |
| chrX         | 49002057  | 49002057  |
| chrX         | 41276133  | 41276133  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2,515 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic12': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 15424094  | 15424094  |
| chr1         | 187628994 | 187628994 |
| chr1         | 203639101 | 203639101 |
| chr1         | 65208097  | 65208097  |
| ...          | ...       | ...       |
| chrX         | 153525460 | 153525460 |
| chrX         | 150983625 | 150983625 |
| chrX         | 39855613  | 39855613  |
| chrX         | 53049668  | 53049668  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 7,402 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic13': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 114746645 | 114746645 |
| chr1         | 31181577  | 31181577  |
| chr1         | 21439845  | 21439845  |
| chr1         | 225915435 | 225915435 |
| ...          | ...       | ...       |
| chrX         | 101347954 | 101347954 |
| chrX         | 21940034  | 21940034  |
| chrX         | 119791761 | 119791761 |
| chrX         | 121577883 | 121577883 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4,193 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic14': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 147599542 | 147599542 |
| chr1         | 43969627  | 43969627  |
| chr1         | 156212697 | 156212697 |
| chr1         | 28235838  | 28235838  |
| ...          | ...       | ...       |
| chrX         | 154378167 | 154378167 |
| chrX         | 49079662  | 49079662  |
| chrX         | 153687497 | 153687497 |
| chrX         | 20141161  | 20141161  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2,875 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic15': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 36323987  | 36323987  |
| chr1         | 112381589 | 112381589 |
| chr1         | 179293479 | 179293479 |
| chr1         | 225653408 | 225653408 |
| ...          | ...       | ...       |
| chrX         | 24410722  | 24410722  |
| chrX         | 9538681   | 9538681   |
| chrX         | 101550237 | 101550237 |
| chrX         | 31266731  | 31266731  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4,682 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic16': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 182029512 | 182029512 |
| chr1         | 174999737 | 174999737 |
| chr1         | 117059952 | 117059952 |
| chr1         | 27725727  | 27725727  |
| ...          | ...       | ...       |
| chrX         | 107826096 | 107826096 |
| chrX         | 74420571  | 74420571  |
| chrX         | 129980672 | 129980672 |
| chrX         | 44872733  | 44872733  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3,384 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic17': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 8026082   | 8026082   |
| chr1         | 179882390 | 179882390 |
| chr1         | 67684898  | 67684898  |
| chr1         | 155273293 | 155273293 |
| ...          | ...       | ...       |
| chrX         | 118495598 | 118495598 |
| chrX         | 119468146 | 119468146 |
| chrX         | 48802221  | 48802221  |
| chrX         | 54639529  | 54639529  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3,577 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic18': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 248837957 | 248837957 |
| chr1         | 3857164   | 3857164   |
| chr1         | 227735167 | 227735167 |
| chr1         | 20343184  | 20343184  |
| ...          | ...       | ...       |
| chrX         | 49166225  | 49166225  |
| chrX         | 51893816  | 51893816  |
| chrX         | 18983629  | 18983629  |
| chrX         | 104112243 | 104112243 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4,148 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic19': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 174159207 | 174159207 |
| chr1         | 156054424 | 156054424 |
| chr1         | 207321251 | 207321251 |
| chr1         | 42924119  | 42924119  |
| ...          | ...       | ...       |
| chrX         | 119574873 | 119574873 |
| chrX         | 15674587  | 15674587  |
| chrX         | 54183333  | 54183333  |
| chrX         | 85828758  | 85828758  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3,335 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic20': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 244652564 | 244652564 |
| chr1         | 179954479 | 179954479 |
| chr1         | 58546537  | 58546537  |
| chr1         | 65309120  | 65309120  |
| ...          | ...       | ...       |
| chrX         | 120486208 | 120486208 |
| chrX         | 49166947  | 49166947  |
| chrX         | 150984220 | 150984220 |
| chrX         | 115561888 | 115561888 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3,557 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic21': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 3624582   | 3624582   |
| chr1         | 45499871  | 45499871  |
| chr1         | 25430001  | 25430001  |
| chr1         | 112674396 | 112674396 |
| ...          | ...       | ...       |
| chrX         | 150982946 | 150982946 |
| chrX         | 153970336 | 153970336 |
| chrX         | 151974625 | 151974625 |
| chrX         | 86148634  | 86148634  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3,261 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic22': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 93180233  | 93180233  |
| chr1         | 84690236  | 84690236  |
| chr1         | 53945602  | 53945602  |
| chr1         | 172532799 | 172532799 |
| ...          | ...       | ...       |
| chrX         | 147912314 | 147912314 |
| chrX         | 11757620  | 11757620  |
| chrX         | 55717917  | 55717917  |
| chrX         | 135521555 | 135521555 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2,697 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic23': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 44013178  | 44013178  |
| chr1         | 200518470 | 200518470 |
| chr1         | 173190365 | 173190365 |
| chr1         | 53048969  | 53048969  |
| ...          | ...       | ...       |
| chrX         | 40290316  | 40290316  |
| chrX         | 123590837 | 123590837 |
| chrX         | 14690463  | 14690463  |
| chrX         | 150636777 | 150636777 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 6,290 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic24': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 98925084  | 98925084  |
| chr1         | 62897829  | 62897829  |
| chr1         | 204340146 | 204340146 |
| chr1         | 192852007 | 192852007 |
| ...          | ...       | ...       |
| chrX         | 134549419 | 134549419 |
| chrX         | 135004018 | 135004018 |
| chrX         | 39601113  | 39601113  |
| chrX         | 33338953  | 33338953  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3,296 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Topic25': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 185091419 | 185091419 |
| chr1         | 10751390  | 10751390  |
| chr1         | 67764357  | 67764357  |
| chr1         | 46042024  | 46042024  |
| ...          | ...       | ...       |
| chrX         | 10831640  | 10831640  |
| chrX         | 28851540  | 28851540  |
| chrX         | 72142398  | 72142398  |
| chrX         | 12504464  | 12504464  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4,943 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome.}, 'DARs_Annotation': {'Alpha': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 187588828 | 187588828 |
| chr1         | 181169238 | 181169238 |
| chr1         | 208438079 | 208438079 |
| chr1         | 57983652  | 57983652  |
| ...          | ...       | ...       |
| chrX         | 24752334  | 24752334  |
| chrX         | 128728901 | 128728901 |
| chrX         | 22300278  | 22300278  |
| chrX         | 18877421  | 18877421  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 13,817 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Beta': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 189782233 | 189782233 |
| chr1         | 217678482 | 217678482 |
| chr1         | 154946412 | 154946412 |
| chr1         | 155066576 | 155066576 |
| ...          | ...       | ...       |
| chrX         | 51618699  | 51618699  |
| chrX         | 107711024 | 107711024 |
| chrX         | 129140133 | 129140133 |
| chrX         | 107672848 | 107672848 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 16,436 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'EPs': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 11629377  | 11629377  |
| chr1         | 221939597 | 221939597 |
| chr1         | 203527999 | 203527999 |
| chr1         | 61226626  | 61226626  |
| ...          | ...       | ...       |
| chrX         | 19613584  | 19613584  |
| chrX         | 154369871 | 154369871 |
| chrX         | 154515156 | 154515156 |
| chrX         | 74291825  | 74291825  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 12,735 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome., 'Epsilon': +--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int32)   | (int32)   |
|--------------+-----------+-----------|
| chr1         | 65498279  | 65498279  |
| chr1         | 85245347  | 85245347  |
| chr1         | 106875975 | 106875975 |
| chr1         | 52744185  | 52744185  |
| ...          | ...       | ...       |
| chrX         | 8731984   | 8731984   |
| chrX         | 4622717   | 4622717   |
| chrX         | 121047298 | 121047298 |
| chrX         | 154096908 | 154096908 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 10,930 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome.}}
{'Topic1':                             Topic1
chr4:16524366-16524366    0.004349
chr6:52181171-52181171    0.003228
chr15:90963084-90963084   0.002476
chr6:64301632-64301632    0.002437
chr7:69949019-69949019    0.002387
...                            ...
chr1:164745451-164745451  0.000282
chr7:139763304-139763304  0.000282
chr4:88284460-88284460    0.000282
chr16:54939577-54939577   0.000282
chr8:14054129-14054129    0.000282

[4066 rows x 1 columns], 'Topic2':                              Topic2
chr2:175606144-175606144   0.001653
chr5:77292014-77292014     0.001617
chr16:4006001-4006001      0.001401
chr2:157732400-157732400   0.001378
chr5:125579424-125579424   0.001344
...                             ...
chr8:87047312-87047312     0.000156
chr10:60372382-60372382    0.000156
chr17:63246238-63246238    0.000156
chr18:37566565-37566565    0.000156
chr13:112756932-112756932  0.000156

[8402 rows x 1 columns], 'Topic3':                             Topic3
chr6:1641198-1641198      0.002799
chr17:13822669-13822669   0.002356
chr3:132187579-132187579  0.002353
chr3:55128217-55128217    0.002247
chr17:51354272-51354272   0.002123
...                            ...
chr3:111539650-111539650  0.000237
chr22:49278426-49278426   0.000237
chr9:128635107-128635107  0.000237
chr22:20020793-20020793   0.000237
chr17:28745613-28745613   0.000237

[5098 rows x 1 columns], 'Topic4':                              Topic4
chr12:107515608-107515608  0.002703
chr8:37243037-37243037     0.002609
chr13:94223713-94223713    0.002384
chr8:142004425-142004425   0.002273
chr2:175490250-175490250   0.002228
...                             ...
chr16:84594373-84594373    0.000256
chr4:177091987-177091987   0.000256
chr18:76454328-76454328    0.000256
chr1:224623738-224623738   0.000256
chr6:44255024-44255024     0.000256

[4480 rows x 1 columns], 'Topic5':                              Topic5
chr5:96459681-96459681     0.002937
chr4:11593728-11593728     0.002076
chr10:111071399-111071399  0.001932
chr10:90667120-90667120    0.001835
chr19:33462337-33462337    0.001829
...                             ...
chr3:133476883-133476883   0.000190
chr1:233588123-233588123   0.000190
chr9:81626731-81626731     0.000190
chr9:74497754-74497754     0.000190
chr10:101330782-101330782  0.000190

[6811 rows x 1 columns], 'Topic6':                             Topic6
chr1:17603148-17603148    0.003443
chr3:109188703-109188703  0.002564
chr1:1208330-1208330      0.002428
chr8:142339470-142339470  0.002287
chr6:170243738-170243738  0.002225
...                            ...
chr1:54303220-54303220    0.000257
chr16:16097555-16097555   0.000257
chr22:46014808-46014808   0.000257
chr2:70558170-70558170    0.000257
chr5:9903646-9903646      0.000257

[4523 rows x 1 columns], 'Topic7':                              Topic7
chr6:107459422-107459422   0.003942
chr5:6633102-6633102       0.003570
chr4:163166654-163166654   0.002990
chr14:76956303-76956303    0.002956
chr2:27370149-27370149     0.002880
...                             ...
chr10:132446686-132446686  0.000374
chr15:89893784-89893784    0.000374
chr7:140062026-140062026   0.000374
chr10:113563382-113563382  0.000374
chr9:134698444-134698444   0.000374

[2975 rows x 1 columns], 'Topic8':                             Topic8
chr9:130712497-130712497  0.003024
chr12:6904663-6904663     0.002934
chr6:32177469-32177469    0.002509
chr19:56403751-56403751   0.002419
chr5:16936036-16936036    0.002313
...                            ...
chr3:17179727-17179727    0.000286
chr6:158516141-158516141  0.000286
chr3:49553579-49553579    0.000286
chr5:73564760-73564760    0.000286
chr17:58248945-58248945   0.000286

[3989 rows x 1 columns], 'Topic9':                             Topic9
chr16:52568973-52568973   0.002591
chr1:147612342-147612342  0.002026
chr7:141416768-141416768  0.001987
chr2:14761076-14761076    0.001958
chr16:87747918-87747918   0.001956
...                            ...
chr22:26161704-26161704   0.000220
chr1:113110083-113110083  0.000220
chr8:60734814-60734814    0.000220
chr8:96491315-96491315    0.000219
chr21:31604530-31604530   0.000219

[5258 rows x 1 columns], 'Topic10':                             Topic10
chr10:118818604-118818604  0.004393
chr20:46145885-46145885    0.003371
chr1:31669451-31669451     0.002455
chr2:95677901-95677901     0.002344
chr6:38169855-38169855     0.002304
...                             ...
chr11:34256286-34256286    0.000240
chr17:45162065-45162065    0.000240
chr2:119484822-119484822   0.000240
chr1:19067285-19067285     0.000240
chr14:77041897-77041897    0.000240

[4891 rows x 1 columns], 'Topic11':                            Topic11
chr8:73746803-73746803    0.003090
chr11:66466497-66466497   0.003051
chr8:102238705-102238705  0.002977
chr8:124372590-124372590  0.002886
chr16:67109567-67109567   0.002839
...                            ...
chrX:41276133-41276133    0.000447
chr6:27894279-27894279    0.000447
chr10:45727030-45727030   0.000447
chr1:211133266-211133266  0.000447
chr1:156436151-156436151  0.000447

[2515 rows x 1 columns], 'Topic12':                            Topic12
chr16:84014065-84014065   0.002696
chr11:64067233-64067233   0.001704
chr7:22786354-22786354    0.001637
chr3:123157507-123157507  0.001615
chr2:126891890-126891890  0.001604
...                            ...
chr1:155028852-155028852  0.000174
chr4:188439857-188439857  0.000174
chr16:3727770-3727770     0.000174
chr2:172555724-172555724  0.000174
chr9:96875703-96875703    0.000174

[7402 rows x 1 columns], 'Topic13':                            Topic13
chr22:44586890-44586890   0.002861
chr7:157396967-157396967  0.002832
chr7:5478651-5478651      0.002638
chr19:4742798-4742798     0.002629
chr12:28443333-28443333   0.002614
...                            ...
chr6:88741881-88741881    0.000271
chr1:27027393-27027393    0.000271
chr7:40644165-40644165    0.000271
chr5:136202656-136202656  0.000271
chr3:17339034-17339034    0.000271

[4193 rows x 1 columns], 'Topic14':                             Topic14
chr7:100538882-100538882   0.003309
chr7:135211396-135211396   0.003161
chr11:96389675-96389675    0.003151
chr12:122974442-122974442  0.003047
chr19:29844850-29844850    0.002993
...                             ...
chr2:150486750-150486750   0.000413
chr19:47035722-47035722    0.000413
chr6:128882953-128882953   0.000413
chr19:4123994-4123994      0.000413
chr7:26200129-26200129     0.000413

[2875 rows x 1 columns], 'Topic15':                           Topic15
chr7:98557579-98557579   0.002174
chr17:41907889-41907889  0.001942
chr19:58466740-58466740  0.001929
chr18:58576800-58576800  0.001882
chr7:82335850-82335850   0.001819
...                           ...
chr11:17760268-17760268  0.000249
chr1:51879204-51879204   0.000249
chr6:26097241-26097241   0.000249
chr17:40071946-40071946  0.000249
chr4:76179768-76179768   0.000249

[4682 rows x 1 columns], 'Topic16':                             Topic16
chr9:92764741-92764741     0.002671
chr4:37686408-37686408     0.002657
chr17:15999454-15999454    0.002421
chr15:75451576-75451576    0.002356
chr16:67935548-67935548    0.002331
...                             ...
chr20:19758385-19758385    0.000387
chr15:40843715-40843715    0.000387
chr10:26216508-26216508    0.000387
chr1:225999258-225999258   0.000386
chr11:130448267-130448267  0.000386

[3384 rows x 1 columns], 'Topic17':                           Topic17
chr4:1346890-1346890     0.002438
chr4:2262050-2262050     0.002268
chr17:82022833-82022833  0.002212
chr17:19378122-19378122  0.002185
chr14:75258835-75258835  0.002156
...                           ...
chr17:3048771-3048771    0.000377
chr18:22168222-22168222  0.000377
chr17:62479053-62479053  0.000377
chr7:77797891-77797891   0.000377
chr6:13873699-13873699   0.000377

[3577 rows x 1 columns], 'Topic18':                            Topic18
chr1:248837957-248837957  0.002633
chr7:112206274-112206274  0.002232
chr11:2400273-2400273     0.002076
chr3:42600421-42600421    0.001972
chr10:26860752-26860752   0.001968
...                            ...
chr11:65113009-65113009   0.000302
chr5:16465602-16465602    0.000302
chr20:13111456-13111456   0.000302
chr17:56960807-56960807   0.000302
chr1:54989463-54989463    0.000302

[4148 rows x 1 columns], 'Topic19':                             Topic19
chr1:174159207-174159207   0.002791
chr7:100148563-100148563   0.002737
chr7:108569435-108569435   0.002590
chr1:156054424-156054424   0.002478
chr13:100674692-100674692  0.002469
...                             ...
chrX:85828758-85828758     0.000376
chr5:138611141-138611141   0.000376
chr7:7969557-7969557       0.000376
chr21:46286108-46286108    0.000376
chr15:48811028-48811028    0.000376

[3335 rows x 1 columns], 'Topic20':                             Topic20
chr5:2112334-2112334       0.002891
chr15:90201066-90201066    0.002465
chr11:47553037-47553037    0.002392
chr11:118997799-118997799  0.002351
chr2:85538520-85538520     0.002335
...                             ...
chr12:111767366-111767366  0.000332
chr20:1965611-1965611      0.000332
chr10:7410752-7410752      0.000332
chr6:17429671-17429671     0.000332
chrX:115561888-115561888   0.000332

[3557 rows x 1 columns], 'Topic21':                            Topic21
chr8:47260632-47260632    0.002511
chr14:69767503-69767503   0.002478
chr19:18429108-18429108   0.002436
chr5:69414851-69414851    0.002389
chr16:2905619-2905619     0.002362
...                            ...
chr1:14552183-14552183    0.000364
chr2:85595391-85595391    0.000364
chr9:104090931-104090931  0.000363
chr5:172772254-172772254  0.000363
chr12:51217098-51217098   0.000363

[3261 rows x 1 columns], 'Topic22':                            Topic22
chr9:19230445-19230445    0.003031
chr11:57741175-57741175   0.003031
chr22:40950851-40950851   0.002855
chr19:58451177-58451177   0.002715
chr11:13668145-13668145   0.002683
...                            ...
chr19:42254913-42254913   0.000439
chr11:61480894-61480894   0.000439
chr20:46811274-46811274   0.000439
chr3:197002300-197002300  0.000439
chr4:77076136-77076136    0.000439

[2697 rows x 1 columns], 'Topic23':                            Topic23
chr9:78694953-78694953    0.001898
chr8:35761578-35761578    0.001810
chr17:358793-358793       0.001796
chr5:111386291-111386291  0.001699
chr2:3421485-3421485      0.001672
...                            ...
chr10:86711073-86711073   0.000198
chr17:41548860-41548860   0.000198
chr1:21262713-21262713    0.000198
chr10:33172784-33172784   0.000198
chr15:27241321-27241321   0.000198

[6290 rows x 1 columns], 'Topic24':                            Topic24
chr2:30641852-30641852    0.002766
chr5:116448940-116448940  0.002490
chr14:78507891-78507891   0.002459
chr10:76218053-76218053   0.002431
chr3:177035665-177035665  0.002400
...                            ...
chr6:64902489-64902489    0.000318
chr8:42338177-42338177    0.000318
chr7:154304943-154304943  0.000317
chr2:40433780-40433780    0.000317
chr2:47344929-47344929    0.000317

[3296 rows x 1 columns], 'Topic25':                            Topic25
chr12:16330438-16330438   0.003096
chr1:185091419-185091419  0.002378
chr20:23078814-23078814   0.002368
chr8:142004425-142004425  0.002244
chr18:40108466-40108466   0.002193
...                            ...
chr6:53651130-53651130    0.000232
chr16:8172238-8172238     0.000231
chr19:5093959-5093959     0.000231
chr5:16773991-16773991    0.000231
chr1:93592230-93592230    0.000231

[4943 rows x 1 columns]}

Does the issue have to do with my pyranges objects? I downloaded my databases from the cisTarget database. Thank you!

SeppeDeWinter commented 1 year ago

Hi @sneddonucsf

It could be that for one (or more) of the topics none of the regions are overlapping with the cistarget database.

You can test this like this:


from ctxcore.rnkdb import FeatherRankingDatabase
from pycistarget.utils import target_to_query

db = FeatherRankingDatabase(rankings_db, name=None)
db_regions = db.genes

for topic in region_sets['topics_otsu'].keys():
    try:
      overlapping_regions =  target_to_query(
          region_sets['topics_otsu'][topic], 
          list(db_regions), 
          fraction_overlap = 0.4)
    except:
      print(topic)

Could you show the output of this?

Best,

Seppe

sneddonucsf commented 1 year ago

@SeppeDeWinter I tried:

import pyranges as pr
from pycistarget.utils import region_names_to_coordinates
region_sets = {}
region_sets['topics_otsu'] = {}
region_sets['DARs_Annotation'] = {}
for topic in region_bin_topics_otsu.keys():
    regions = region_bin_topics_otsu[topic].index[region_bin_topics_otsu[topic].index.str.startswith('chr')] #only keep regions on known chromosomes
    region_sets['topics_otsu'][topic] = pr.PyRanges(region_names_to_coordinates(regions))
for DAR in markers_dict_Annotation.keys():
    regions = markers_dict_Annotation[DAR].index[markers_dict_Annotation[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
    region_sets['DARs_Annotation'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))
for key in region_sets.keys():
    print(f'{key}: {region_sets[key].keys()}')

rankings_db = '/wynton/home/sneddon/seandelao1991/scenic_proj/input/hg38_screen_v10_clust.regions_vs_motifs.rankings.feather'

from ctxcore.rnkdb import FeatherRankingDatabase
from pycistarget.utils import target_to_query

db = FeatherRankingDatabase(rankings_db, name=None)
db_regions = db.genes

for topic in region_sets['topics_otsu'].keys():
    try:
      overlapping_regions =  target_to_query(
          region_sets['topics_otsu'][topic], 
          list(db_regions), 
          fraction_overlap = 0.4)
    except:
      print(topic)

And got the error:

Traceback (most recent call last):
  File "scenic+_6.py", line 53, in <module>
    db = FeatherRankingDatabase(rankings_db, name=None)
  File "/wynton/home/sneddon/seandelao1991/scenic_plus/lib/python3.8/site-packages/ctxcore/rnkdb.py", line 104, in __init__
    super().__init__(name=name)
  File "/wynton/home/sneddon/seandelao1991/scenic_plus/lib/python3.8/site-packages/ctxcore/rnkdb.py", line 29, in __init__
    assert name, "Name must be specified."
AssertionError: Name must be specified.
SeppeDeWinter commented 1 year ago

Hi my mistake.

Can you do:


db = FeatherRankingDatabase(rankings_db, name="test")

instead?

Best,

Seppe

sneddonucsf commented 1 year ago

@SeppeDeWinter the output is:

topics_otsu: dict_keys(['Topic1', 'Topic2', 'Topic3', 'Topic4', 'Topic5', 'Topic6', 'Topic7', 'Topic8', 'Topic9', 'Topic10', 'Topic11', 'Topic12', 'Topic13', 'Topic14', 'Topic15', 'Topic16', 'Topic17', 'Topic18', 'Topic19', 'Topic20', 'Topic21', 'Topic22', 'Topic23', 'Topic24', 'Topic25'])
DARs_Annotation: dict_keys(['Alpha', 'Beta', 'EPs', 'Epsilon'])
Topic1
Topic2
Topic3
Topic4
Topic5
Topic6
Topic7
Topic8
Topic9
Topic10
Topic11
Topic12
Topic13
Topic14
Topic15
Topic16
Topic17
Topic18
Topic19
Topic20
Topic21
Topic22
Topic23
Topic24
Topic25
SeppeDeWinter commented 1 year ago

Hmm .. interesting. Seems like none of the regions in your topics overlap with the cistarget database. This is very unexpected.

I was looking at the regions you show above, they seem to have a length of 0? For example,

chr8:47260632-47260632    0.002511
chr14:69767503-69767503   0.002478
chr19:18429108-18429108   0.002436
chr5:69414851-69414851    0.002389
chr16:2905619-2905619     0.002362

By default these should be of length 500. This is probably the cause of the issue! Can you recheck how you ran the consensus peak calling?

Just to be absolutely sure this is the issue, can you also show the output of:


db_regions[0:100]

Best,

Seppe

sneddonucsf commented 1 year ago

@SeppeDeWinter silly mistake, I accidentally copied my 'start' position twice instead of 'start' and 'end' when exporting my peaks from ArchR. Fixed it, ran run_pycistarget() and everything went smoothly. Thank you and sorry to bother!