aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
177 stars 28 forks source link

Trouble with run_pycistarget #370

Open mwetzel7 opened 5 months ago

mwetzel7 commented 5 months ago

Describe the bug When trying to run "run_pycistarget" I get this error: ValueError: invalid literal for int() with base 10: '1.17e+08'

To Reproduce I was following the 10x multiome tutorial, and trying to run the "run_pycistarget" wrapper from "scenicplus.wrappers.run_pycistarget". That doesn't seem available anymore, but I also tried with the cisTarget for SCENIC+ tutorial here: https://pycistarget.readthedocs.io/en/latest/pycistarget_scenic%2B_wrapper.html and got the same error. For inputs, I had created my own cisTarget databases (as my data were aligned to hg19) by following the provided tutorials to do so.

Error output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[15], [line 2](vscode-notebook-cell:?execution_count=15&line=2)
      [1](vscode-notebook-cell:?execution_count=15&line=1) from scenicplus.wrappers.run_pycistarget import run_pycistarget
----> [2](vscode-notebook-cell:?execution_count=15&line=2) run_pycistarget(
      [3](vscode-notebook-cell:?execution_count=15&line=3)     region_sets = region_sets,
      [4](vscode-notebook-cell:?execution_count=15&line=4)     species = 'homo_sapiens',
      [5](vscode-notebook-cell:?execution_count=15&line=5)     save_path = os.path.join(outDir, 'motifs'),
      [6](vscode-notebook-cell:?execution_count=15&line=6)     ctx_db_path = rankings_db,
      [7](vscode-notebook-cell:?execution_count=15&line=7)     dem_db_path = scores_db,
      [8](vscode-notebook-cell:?execution_count=15&line=8)     path_to_motif_annotations = motif_annotation,
      [9](vscode-notebook-cell:?execution_count=15&line=9)     run_without_promoters = True,
     [10](vscode-notebook-cell:?execution_count=15&line=10)     n_cpu = 8,
     [11](vscode-notebook-cell:?execution_count=15&line=11)     _temp_dir = os.path.join(tmpDir2, 'ray_spill'),
     [12](vscode-notebook-cell:?execution_count=15&line=12)     annotation_version = 'v10nr',
     [13](vscode-notebook-cell:?execution_count=15&line=13)     )

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/wrappers/run_pycistarget.py:182](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/wrappers/run_pycistarget.py:182), in run_pycistarget(region_sets, species, save_path, custom_annot, save_partial, ctx_db_path, dem_db_path, run_without_promoters, biomart_host, promoter_space, ctx_auc_threshold, ctx_nes_threshold, ctx_rank_threshold, dem_log2fc_thr, dem_motif_hit_thr, dem_max_bg_regions, annotation, motif_similarity_fdr, path_to_motif_annotations, annotation_version, n_cpu, _temp_dir, exclude_motifs, exclude_collection, **kwargs)
    [180](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/wrappers/run_pycistarget.py:180) ## CISTARGET
    [181](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/wrappers/run_pycistarget.py:181) regions = region_sets[key]
--> [182](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/wrappers/run_pycistarget.py:182) ctx_db = cisTargetDatabase(ctx_db_path, regions)  
    [183](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/wrappers/run_pycistarget.py:183) if exclude_motifs is not None:
    [184](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/wrappers/run_pycistarget.py:184)     out = pd.read_csv(exclude_motifs, header=None).iloc[:,0].tolist()

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:55](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:55), in cisTargetDatabase.__init__(self, fname, region_sets, name, fraction_overlap)
     [36](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:36) def __init__(self, 
     [37](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:37)             fname: str,
     [38](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:38)             region_sets: Union[Dict[str, pr.PyRanges], pr.PyRanges] = None,
     [39](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:39)             name: Optional[str] = None,
     [40](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:40)             fraction_overlap: float = 0.4):
     [41](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:41)     """
     [42](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:42)     Initialize cisTargetDatabase
     [43](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:43)     
   (...)
     [53](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:53)         Minimal overlap between query and regions in the database for the mapping.     
     [54](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:54)     """
---> [55](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:55)     self.regions_to_db, self.db_rankings, self.total_regions = self.load_db(fname,
     [56](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:56)                                                       region_sets,
     [57](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:57)                                                       name,
     [58](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:58)                                                       fraction_overlap)

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:108](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:108), in cisTargetDatabase.load_db(self, fname, region_sets, name, fraction_overlap)
    [106](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:106) if region_sets is not None:
    [107](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:107)     if type(region_sets) == dict:
--> [108](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:108)         target_to_db_dict = {x: target_to_query(region_sets[x], list(db_regions), fraction_overlap = fraction_overlap) for x in region_sets.keys()}
    [109](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:109)         target_regions_in_db = list(set(sum([target_to_db_dict[x]['Query'].tolist() for x in target_to_db_dict.keys()],[])))
    [110](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:110)     elif type(region_sets) == pr.PyRanges:

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:108](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:108), in <dictcomp>(.0)
    [106](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:106) if region_sets is not None:
    [107](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:107)     if type(region_sets) == dict:
--> [108](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:108)         target_to_db_dict = {x: target_to_query(region_sets[x], list(db_regions), fraction_overlap = fraction_overlap) for x in region_sets.keys()}
    [109](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:109)         target_regions_in_db = list(set(sum([target_to_db_dict[x]['Query'].tolist() for x in target_to_db_dict.keys()],[])))
    [110](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/motif_enrichment_cistarget.py:110)     elif type(region_sets) == pr.PyRanges:

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:280](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:280), in target_to_query(target, query, fraction_overlap)
    [278](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:278)     query_pr=pr.read_bed(query)
    [279](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:279) if isinstance(query, list):
--> [280](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:280)     query_pr=pr.PyRanges(region_names_to_coordinates(query))
    [281](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:281) if isinstance(query, pr.PyRanges):
    [282](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:282)     query_pr=query

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31), in region_names_to_coordinates(region_names)
     [29](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:29) chrom=pd.DataFrame([i.split(':', 1)[0] for i in region_names if ':' in i])
     [30](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:30) coor = [i.split(':', 1)[1] for i in region_names if ':' in i]
---> [31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31) start=pd.DataFrame([int(i.split('-', 1)[0]) for i in coor])
     [32](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:32) end=pd.DataFrame([int(i.split('-', 1)[1]) for i in coor])
     [33](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:33) regiondf=pd.concat([chrom, start, end], axis=1, sort=False)

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31), in <listcomp>(.0)
     [29](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:29) chrom=pd.DataFrame([i.split(':', 1)[0] for i in region_names if ':' in i])
     [30](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:30) coor = [i.split(':', 1)[1] for i in region_names if ':' in i]
---> [31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31) start=pd.DataFrame([int(i.split('-', 1)[0]) for i in coor])
     [32](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:32) end=pd.DataFrame([int(i.split('-', 1)[1]) for i in coor])
     [33](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:33) regiondf=pd.concat([chrom, start, end], axis=1, sort=False)

ValueError: invalid literal for int() with base 10: '1.17e+08'`

Screenshots Here are screenshots of my custom cisTarget feather DBs after reading them in with "pandas.read_feather" (the first few and last few columns of the scores and rankings):

Screenshot 2024-04-25 at 11 39 50 AM Screenshot 2024-04-25 at 11 40 18 AM Screenshot 2024-04-25 at 11 40 28 AM Screenshot 2024-04-25 at 11 40 08 AM

Version (please complete the following information):

I'm not sure why I'm getting this error and if it's from my custom DB or something else.

Thank you for your help!

SeppeDeWinter commented 5 months ago

Hi @mwetzel7

Indeed this wrapper function is now deprecated, I would suggest to follow the new tutorials on: https://scenicplus.readthedocs.io/en/latest/tutorials.html.

As to your error, can you show how your region_sets look like?

All the best,

Seppe

mwetzel7 commented 5 months ago

Hi Seppe,

Thanks for the updated link.

Here is the output of my region_sets object:

{'topics': {'Topic1': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 201986333 | 201988638 | | chr1 | 202025595 | 202028101 | | chr1 | 1708993 | 1712275 | | chr1 | 223853125 | 223854294 | | ... | ... | ... | | chrX | 16042126 | 16042993 | | chrX | 3012333 | 3012751 | | chrX | 20159306 | 20160422 | | chrX | 17377623 | 17378030 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,742 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic2': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 209989122 | 209990158 | | chr1 | 161043203 | 161045212 | | chr1 | 167632104 | 167633831 | | chr1 | 109805761 | 109806941 | | ... | ... | ... | | chrX | 106817415 | 106818172 | | chrX | 49686815 | 49687669 | | chrX | 114257621 | 114258236 | | chrX | 103172609 | 103174306 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,428 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic3': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 170095644 | 170096356 | | chr1 | 177178306 | 177178991 | | chr1 | 150539385 | 150547703 | | chr1 | 157938117 | 157940226 | | ... | ... | ... | | chrX | 118107760 | 118111129 | | chrX | 45709848 | 45711508 | | chrX | 39589971 | 39590364 | | chrX | 40856170 | 40856613 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,507 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic4': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 203069483 | 203070178 | | chr1 | 181080723 | 181082733 | | chr1 | 75118075 | 75119037 | | chr1 | 92791818 | 92792769 | | ... | ... | ... | | chrX | 133593737 | 133594720 | | chrX | 13503234 | 13503688 | | chrX | 21824867 | 21825200 | | chrX | 65144901 | 65145415 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,619 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic5': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 168732171 | 168732836 | | chr1 | 152878275 | 152879255 | | chr1 | 175439944 | 175441001 | | chr1 | 97025400 | 97026316 | | ... | ... | ... | | chrX | 9880975 | 9881504 | | chrX | 62780548 | 62781211 | | chrX | 115630870 | 115631435 | | chrX | 47059803 | 47060150 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,702 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic6': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 111019479 | 111020693 | | chr1 | 231761911 | 231764351 | | chr1 | 156783309 | 156785678 | | chr1 | 173159297 | 173160632 | | ... | ... | ... | | chrX | 24068560 | 24069012 | | chrX | 116510328 | 116510898 | | chrX | 23799330 | 23800080 | | chrX | 101914619 | 101915215 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,735 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic7': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 9648373 | 9650268 | | chr1 | 86042110 | 86044683 | | chr1 | 6761479 | 6762421 | | chr1 | 1891134 | 1891915 | | ... | ... | ... | | chrX | 153941048 | 153941760 | | chrX | 122866614 | 122867192 | | chrX | 48858439 | 48859194 | | chrX | 40594391 | 40595602 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,800 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic8': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 7022237 | 7023848 | | chr1 | 45284563 | 45286678 | | chr1 | 175138133 | 175139160 | | chr1 | 230818904 | 230820453 | | ... | ... | ... | | chrX | 17626962 | 17627475 | | chrX | 43365026 | 43365394 | | chrX | 9504216 | 9504473 | | chrX | 128494140 | 128494570 | +--------------+-----------+-----------+ Unstranded PyRanges object has 7,684 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic9': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 198624686 | 198627472 | | chr1 | 56184332 | 56185091 | | chr1 | 228996555 | 228997478 | | chr1 | 147016695 | 147017409 | | ... | ... | ... | | chrX | 17423245 | 17424040 | | chrX | 71249709 | 71250540 | | chrX | 122221208 | 122221661 | | chrX | 56771419 | 56771830 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,849 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic10': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 167416630 | 167417442 | | chr1 | 115211696 | 115214930 | | chr1 | 205894584 | 205895783 | | chr1 | 244504701 | 244505629 | | ... | ... | ... | | chrX | 128979530 | 128980262 | | chrX | 24524109 | 24524582 | | chrX | 77631331 | 77631947 | | chrX | 129095071 | 129095677 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,178 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic11': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 209779814 | 209787548 | | chr1 | 21620355 | 21621875 | | chr1 | 225887664 | 225888597 | | chr1 | 183149634 | 183150592 | | ... | ... | ... | | chrX | 73755176 | 73756869 | | chrX | 16042126 | 16042993 | | chrX | 39754585 | 39755340 | | chrX | 19192737 | 19193156 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,240 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic12': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 85100087 | 85101090 | | chr1 | 147229238 | 147230259 | | chr1 | 75118075 | 75119037 | | chr1 | 201425495 | 201426631 | | ... | ... | ... | | chrX | 152965100 | 152966525 | | chrX | 46630167 | 46630620 | | chrX | 54466448 | 54467418 | | chrX | 153058811 | 153060639 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,727 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic13': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 15850334 | 15853944 | | chr1 | 59245323 | 59251724 | | chr1 | 154942503 | 154948437 | | chr1 | 212779946 | 212783210 | | ... | ... | ... | | chrX | 152973757 | 152974461 | | chrX | 118986485 | 118987412 | | chrX | 149369046 | 149369708 | | chrX | 152863486 | 152865163 | +--------------+-----------+-----------+ Unstranded PyRanges object has 1,926 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic14': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 149856121 | 149860717 | | chr1 | 113931133 | 113934939 | | chr1 | 110880229 | 110883171 | | chr1 | 114446833 | 114448476 | | ... | ... | ... | | chrX | 38662680 | 38665543 | | chrX | 23825231 | 23826259 | | chrX | 1571450 | 1573499 | | chrX | 153235448 | 153238827 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,382 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic15': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 115721778 | 115722952 | | chr1 | 159046137 | 159047578 | | chr1 | 86072413 | 86073567 | | chr1 | 117079736 | 117081832 | | ... | ... | ... | | chrX | 29680185 | 29681712 | | chrX | 49040631 | 49041569 | | chrX | 102911509 | 102912317 | | chrX | 153275683 | 153276467 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,726 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic16': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 207174932 | 207176903 | | chr1 | 201986333 | 201988638 | | chr1 | 181086404 | 181087520 | | chr1 | 181088207 | 181089299 | | ... | ... | ... | | chrX | 152965100 | 152966525 | | chrX | 128979530 | 128980262 | | chrX | 12759422 | 12760005 | | chrX | 112083609 | 112085109 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,821 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic17': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 95107099 | 95108569 | | chr1 | 64808739 | 64810266 | | chr1 | 94167026 | 94168182 | | chr1 | 117184849 | 117186550 | | ... | ... | ... | | chrX | 106045355 | 106046379 | | chrX | 46119847 | 46120427 | | chrX | 55945818 | 55946748 | | chrX | 22264661 | 22265196 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,750 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic18': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 118147617 | 118150425 | | chr1 | 11967662 | 11970199 | | chr1 | 156629521 | 156632021 | | chr1 | 173792932 | 173795064 | | ... | ... | ... | | chrX | 54665641 | 54666702 | | chrX | 49011554 | 49012955 | | chrX | 152109751 | 152110779 | | chrX | 47517880 | 47518995 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,497 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic19': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 78004593 | 78006830 | | chr1 | 183247884 | 183249467 | | chr1 | 201278389 | 201281177 | | chr1 | 71769188 | 71769943 | | ... | ... | ... | | chrX | 54518832 | 54519227 | | chrX | 109411080 | 109412148 | | chrX | 48827844 | 48828824 | | chrX | 151988254 | 151988654 | +--------------+-----------+-----------+ Unstranded PyRanges object has 5,390 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic20': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 113006903 | 113009217 | | chr1 | 84503534 | 84504271 | | chr1 | 197752445 | 197753288 | | chr1 | 155144977 | 155151656 | | ... | ... | ... | | chrX | 106983255 | 106983693 | | chrX | 62654009 | 62654923 | | chrX | 53009601 | 53010033 | | chrX | 134232613 | 134233347 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,678 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome.}, 'DARs': {'treated': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 112198722 | 112199143 | | chr1 | 223564008 | 223564305 | | chr1 | 153019004 | 153019252 | | chr1 | 211987098 | 211987288 | | ... | ... | ... | | chrX | 10812386 | 10812848 | | chrX | 115412378 | 115412710 | | chrX | 55945818 | 55946748 | | chrX | 56253964 | 56254426 | +--------------+-----------+-----------+ Unstranded PyRanges object has 7,202 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'untreated': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 50712759 | 50712942 | | chr1 | 100918257 | 100918507 | | chr1 | 85048919 | 85049240 | | chr1 | 61753362 | 61753709 | | ... | ... | ... | | chrX | 19865019 | 19865385 | | chrX | 24524109 | 24524582 | | chrX | 128811893 | 128812654 | | chrX | 150017336 | 150017897 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,418 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome.}}

SeppeDeWinter commented 4 months ago

Hi @mwetzel7

Not sure what is going on here.

Can you try the following:


# read your feather database 
import pandas as pd
db = pd.read_feather(<PATH_TO_DATABASE>).drop("motifs", axis = 1)

from pycistarget.utils import region_names_to_coordinates

test = region_names_to_coordinates(db.columns)

Does that produce the same error?

Best,

Seppe

mwetzel7 commented 4 months ago

Hi @SeppeDeWinter

Yes, running that code does produce the same error (for both rankings and scores DB):

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[15], [line 9](vscode-notebook-cell:?execution_count=15&line=9)
      [5](vscode-notebook-cell:?execution_count=15&line=5) db = pd.read_feather(rankings_db).drop("motifs", axis = 1)
      [7](vscode-notebook-cell:?execution_count=15&line=7) from pycistarget.utils import region_names_to_coordinates
----> [9](vscode-notebook-cell:?execution_count=15&line=9) test = region_names_to_coordinates(db.columns)

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31), in region_names_to_coordinates(region_names)
     [29](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:29) chrom=pd.DataFrame([i.split(':', 1)[0] for i in region_names if ':' in i])
     [30](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:30) coor = [i.split(':', 1)[1] for i in region_names if ':' in i]
---> [31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31) start=pd.DataFrame([int(i.split('-', 1)[0]) for i in coor])
     [32](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:32) end=pd.DataFrame([int(i.split('-', 1)[1]) for i in coor])
     [33](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:33) regiondf=pd.concat([chrom, start, end], axis=1, sort=False)

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31), in <listcomp>(.0)
     [29](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:29) chrom=pd.DataFrame([i.split(':', 1)[0] for i in region_names if ':' in i])
     [30](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:30) coor = [i.split(':', 1)[1] for i in region_names if ':' in i]
---> [31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31) start=pd.DataFrame([int(i.split('-', 1)[0]) for i in coor])
     [32](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:32) end=pd.DataFrame([int(i.split('-', 1)[1]) for i in coor])
     [33](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:33) regiondf=pd.concat([chrom, start, end], axis=1, sort=False)

ValueError: invalid literal for int() with base 10: '1.17e+08'
SeppeDeWinter commented 4 months ago

Hi @mwetzel7

Then somehow there is something wrong with the region names in those databases.

You can check which one is the culprit by running:


# read your feather database 
import pandas as pd
db = pd.read_feather(<PATH_TO_DATABASE>).drop("motifs", axis = 1)

from pycistarget.utils import region_names_to_coordinates

for region in db.columns:
    try:
        test = region_names_to_coordinates([region])
    except:
        print(region)
mwetzel7 commented 4 months ago

Hi @SeppeDeWinter

Using the above code I was able to see one column named incorrectly: chr8:1.17e+08-117000643

I read the full db in and renamed this column to: chr8:117000000-117000643

I seem to be able to get past the original error I was getting now, but I am running into the error that NameError: name 'run_cistarget' is not defined. Above you said it is deprecated and to follow the new tutorials, however, I do not see a tutorial in the link provided that runs pycisTarget. On the pycisTarget page's tutorials it says to still use the run_cistarget wrapper from SCENIC+.

Sorry if I missed it, but how should I run pycisTarget now?

SeppeDeWinter commented 4 months ago

Hi @mwetzel7

Yes, I should still update the pycistarget tutorials. Sorry about that.

If you are running pycistarget in the context of a SCENIC+ analysis, you can follow this tutorial: https://scenicplus.readthedocs.io/en/latest/human_cerebellum.html . In that case the Snakemake pipeline will automatically run pycistarget.

In case you want to run pycistarget on its own. Pycistarget now has a command line interface, see example below on how to use it.


pycistarget cistarget \
  --cistarget_db_fname <PATH_TO_YOUR_DATABASE> \
  --bed_fname <PATH_TO_YOUR_BED_FILE> \
  --output_folder <PATH_TO_OUTPUT_FOLDER> \
  --species <SPECIES_NAME> \
  --write_html

I hope that helps?

All the best,

Seppe