aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
177 stars 28 forks source link

Trouble with run_pycistarget #370

Open mwetzel7 opened 5 months ago

mwetzel7 commented 5 months ago

Describe the bug When trying to run "run_pycistarget" I get this error: ValueError: invalid literal for int() with base 10: '1.17e+08'

To Reproduce I was following the 10x multiome tutorial, and trying to run the "run_pycistarget" wrapper from "scenicplus.wrappers.run_pycistarget". That doesn't seem available anymore, but I also tried with the cisTarget for SCENIC+ tutorial here: and got the same error. For inputs, I had created my own cisTarget databases (as my data were aligned to hg19) by following the provided tutorials to do so.

Error output

ValueError                                Traceback (most recent call last)
Cell In[15], [line 2](vscode-notebook-cell:?execution_count=15&line=2)
      [1](vscode-notebook-cell:?execution_count=15&line=1) from scenicplus.wrappers.run_pycistarget import run_pycistarget
----> [2](vscode-notebook-cell:?execution_count=15&line=2) run_pycistarget(
      [3](vscode-notebook-cell:?execution_count=15&line=3)     region_sets = region_sets,
      [4](vscode-notebook-cell:?execution_count=15&line=4)     species = 'homo_sapiens',
      [5](vscode-notebook-cell:?execution_count=15&line=5)     save_path = os.path.join(outDir, 'motifs'),
      [6](vscode-notebook-cell:?execution_count=15&line=6)     ctx_db_path = rankings_db,
      [7](vscode-notebook-cell:?execution_count=15&line=7)     dem_db_path = scores_db,
      [8](vscode-notebook-cell:?execution_count=15&line=8)     path_to_motif_annotations = motif_annotation,
      [9](vscode-notebook-cell:?execution_count=15&line=9)     run_without_promoters = True,
     [10](vscode-notebook-cell:?execution_count=15&line=10)     n_cpu = 8,
     [11](vscode-notebook-cell:?execution_count=15&line=11)     _temp_dir = os.path.join(tmpDir2, 'ray_spill'),
     [12](vscode-notebook-cell:?execution_count=15&line=12)     annotation_version = 'v10nr',
     [13](vscode-notebook-cell:?execution_count=15&line=13)     )

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/scenicplus/wrappers/](, in run_pycistarget(region_sets, species, save_path, custom_annot, save_partial, ctx_db_path, dem_db_path, run_without_promoters, biomart_host, promoter_space, ctx_auc_threshold, ctx_nes_threshold, ctx_rank_threshold, dem_log2fc_thr, dem_motif_hit_thr, dem_max_bg_regions, annotation, motif_similarity_fdr, path_to_motif_annotations, annotation_version, n_cpu, _temp_dir, exclude_motifs, exclude_collection, **kwargs)
    [180]( ## CISTARGET
    [181]( regions = region_sets[key]
--> [182]( ctx_db = cisTargetDatabase(ctx_db_path, regions)  
    [183]( if exclude_motifs is not None:
    [184](     out = pd.read_csv(exclude_motifs, header=None).iloc[:,0].tolist()

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/](, in cisTargetDatabase.__init__(self, fname, region_sets, name, fraction_overlap)
     [36]( def __init__(self, 
     [37](             fname: str,
     [38](             region_sets: Union[Dict[str, pr.PyRanges], pr.PyRanges] = None,
     [39](             name: Optional[str] = None,
     [40](             fraction_overlap: float = 0.4):
     [41](     """
     [42](     Initialize cisTargetDatabase
     [53](         Minimal overlap between query and regions in the database for the mapping.     
     [54](     """
---> [55](     self.regions_to_db, self.db_rankings, self.total_regions = self.load_db(fname,
     [56](                                                       region_sets,
     [57](                                                       name,
     [58](                                                       fraction_overlap)

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/](, in cisTargetDatabase.load_db(self, fname, region_sets, name, fraction_overlap)
    [106]( if region_sets is not None:
    [107](     if type(region_sets) == dict:
--> [108](         target_to_db_dict = {x: target_to_query(region_sets[x], list(db_regions), fraction_overlap = fraction_overlap) for x in region_sets.keys()}
    [109](         target_regions_in_db = list(set(sum([target_to_db_dict[x]['Query'].tolist() for x in target_to_db_dict.keys()],[])))
    [110](     elif type(region_sets) == pr.PyRanges:

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/](, in <dictcomp>(.0)
    [106]( if region_sets is not None:
    [107](     if type(region_sets) == dict:
--> [108](         target_to_db_dict = {x: target_to_query(region_sets[x], list(db_regions), fraction_overlap = fraction_overlap) for x in region_sets.keys()}
    [109](         target_regions_in_db = list(set(sum([target_to_db_dict[x]['Query'].tolist() for x in target_to_db_dict.keys()],[])))
    [110](     elif type(region_sets) == pr.PyRanges:

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/](, in target_to_query(target, query, fraction_overlap)
    [278](     query_pr=pr.read_bed(query)
    [279]( if isinstance(query, list):
--> [280](     query_pr=pr.PyRanges(region_names_to_coordinates(query))
    [281]( if isinstance(query, pr.PyRanges):
    [282](     query_pr=query

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/](, in region_names_to_coordinates(region_names)
     [29]( chrom=pd.DataFrame([i.split(':', 1)[0] for i in region_names if ':' in i])
     [30]( coor = [i.split(':', 1)[1] for i in region_names if ':' in i]
---> [31]( start=pd.DataFrame([int(i.split('-', 1)[0]) for i in coor])
     [32]( end=pd.DataFrame([int(i.split('-', 1)[1]) for i in coor])
     [33]( regiondf=pd.concat([chrom, start, end], axis=1, sort=False)

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/](, in <listcomp>(.0)
     [29]( chrom=pd.DataFrame([i.split(':', 1)[0] for i in region_names if ':' in i])
     [30]( coor = [i.split(':', 1)[1] for i in region_names if ':' in i]
---> [31]( start=pd.DataFrame([int(i.split('-', 1)[0]) for i in coor])
     [32]( end=pd.DataFrame([int(i.split('-', 1)[1]) for i in coor])
     [33]( regiondf=pd.concat([chrom, start, end], axis=1, sort=False)

ValueError: invalid literal for int() with base 10: '1.17e+08'`

Screenshots Here are screenshots of my custom cisTarget feather DBs after reading them in with "pandas.read_feather" (the first few and last few columns of the scores and rankings):

Screenshot 2024-04-25 at 11 39 50 AM Screenshot 2024-04-25 at 11 40 18 AM Screenshot 2024-04-25 at 11 40 28 AM Screenshot 2024-04-25 at 11 40 08 AM

Version (please complete the following information):

I'm not sure why I'm getting this error and if it's from my custom DB or something else.

Thank you for your help!

SeppeDeWinter commented 5 months ago

Hi @mwetzel7

Indeed this wrapper function is now deprecated, I would suggest to follow the new tutorials on:

As to your error, can you show how your region_sets look like?

All the best,


mwetzel7 commented 5 months ago

Hi Seppe,

Thanks for the updated link.

Here is the output of my region_sets object:

{'topics': {'Topic1': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 201986333 | 201988638 | | chr1 | 202025595 | 202028101 | | chr1 | 1708993 | 1712275 | | chr1 | 223853125 | 223854294 | | ... | ... | ... | | chrX | 16042126 | 16042993 | | chrX | 3012333 | 3012751 | | chrX | 20159306 | 20160422 | | chrX | 17377623 | 17378030 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,742 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic2': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 209989122 | 209990158 | | chr1 | 161043203 | 161045212 | | chr1 | 167632104 | 167633831 | | chr1 | 109805761 | 109806941 | | ... | ... | ... | | chrX | 106817415 | 106818172 | | chrX | 49686815 | 49687669 | | chrX | 114257621 | 114258236 | | chrX | 103172609 | 103174306 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,428 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic3': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 170095644 | 170096356 | | chr1 | 177178306 | 177178991 | | chr1 | 150539385 | 150547703 | | chr1 | 157938117 | 157940226 | | ... | ... | ... | | chrX | 118107760 | 118111129 | | chrX | 45709848 | 45711508 | | chrX | 39589971 | 39590364 | | chrX | 40856170 | 40856613 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,507 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic4': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 203069483 | 203070178 | | chr1 | 181080723 | 181082733 | | chr1 | 75118075 | 75119037 | | chr1 | 92791818 | 92792769 | | ... | ... | ... | | chrX | 133593737 | 133594720 | | chrX | 13503234 | 13503688 | | chrX | 21824867 | 21825200 | | chrX | 65144901 | 65145415 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,619 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic5': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 168732171 | 168732836 | | chr1 | 152878275 | 152879255 | | chr1 | 175439944 | 175441001 | | chr1 | 97025400 | 97026316 | | ... | ... | ... | | chrX | 9880975 | 9881504 | | chrX | 62780548 | 62781211 | | chrX | 115630870 | 115631435 | | chrX | 47059803 | 47060150 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,702 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic6': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 111019479 | 111020693 | | chr1 | 231761911 | 231764351 | | chr1 | 156783309 | 156785678 | | chr1 | 173159297 | 173160632 | | ... | ... | ... | | chrX | 24068560 | 24069012 | | chrX | 116510328 | 116510898 | | chrX | 23799330 | 23800080 | | chrX | 101914619 | 101915215 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,735 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic7': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 9648373 | 9650268 | | chr1 | 86042110 | 86044683 | | chr1 | 6761479 | 6762421 | | chr1 | 1891134 | 1891915 | | ... | ... | ... | | chrX | 153941048 | 153941760 | | chrX | 122866614 | 122867192 | | chrX | 48858439 | 48859194 | | chrX | 40594391 | 40595602 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,800 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic8': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 7022237 | 7023848 | | chr1 | 45284563 | 45286678 | | chr1 | 175138133 | 175139160 | | chr1 | 230818904 | 230820453 | | ... | ... | ... | | chrX | 17626962 | 17627475 | | chrX | 43365026 | 43365394 | | chrX | 9504216 | 9504473 | | chrX | 128494140 | 128494570 | +--------------+-----------+-----------+ Unstranded PyRanges object has 7,684 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic9': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 198624686 | 198627472 | | chr1 | 56184332 | 56185091 | | chr1 | 228996555 | 228997478 | | chr1 | 147016695 | 147017409 | | ... | ... | ... | | chrX | 17423245 | 17424040 | | chrX | 71249709 | 71250540 | | chrX | 122221208 | 122221661 | | chrX | 56771419 | 56771830 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,849 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic10': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 167416630 | 167417442 | | chr1 | 115211696 | 115214930 | | chr1 | 205894584 | 205895783 | | chr1 | 244504701 | 244505629 | | ... | ... | ... | | chrX | 128979530 | 128980262 | | chrX | 24524109 | 24524582 | | chrX | 77631331 | 77631947 | | chrX | 129095071 | 129095677 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,178 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic11': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 209779814 | 209787548 | | chr1 | 21620355 | 21621875 | | chr1 | 225887664 | 225888597 | | chr1 | 183149634 | 183150592 | | ... | ... | ... | | chrX | 73755176 | 73756869 | | chrX | 16042126 | 16042993 | | chrX | 39754585 | 39755340 | | chrX | 19192737 | 19193156 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,240 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic12': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 85100087 | 85101090 | | chr1 | 147229238 | 147230259 | | chr1 | 75118075 | 75119037 | | chr1 | 201425495 | 201426631 | | ... | ... | ... | | chrX | 152965100 | 152966525 | | chrX | 46630167 | 46630620 | | chrX | 54466448 | 54467418 | | chrX | 153058811 | 153060639 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,727 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic13': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 15850334 | 15853944 | | chr1 | 59245323 | 59251724 | | chr1 | 154942503 | 154948437 | | chr1 | 212779946 | 212783210 | | ... | ... | ... | | chrX | 152973757 | 152974461 | | chrX | 118986485 | 118987412 | | chrX | 149369046 | 149369708 | | chrX | 152863486 | 152865163 | +--------------+-----------+-----------+ Unstranded PyRanges object has 1,926 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic14': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 149856121 | 149860717 | | chr1 | 113931133 | 113934939 | | chr1 | 110880229 | 110883171 | | chr1 | 114446833 | 114448476 | | ... | ... | ... | | chrX | 38662680 | 38665543 | | chrX | 23825231 | 23826259 | | chrX | 1571450 | 1573499 | | chrX | 153235448 | 153238827 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,382 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic15': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 115721778 | 115722952 | | chr1 | 159046137 | 159047578 | | chr1 | 86072413 | 86073567 | | chr1 | 117079736 | 117081832 | | ... | ... | ... | | chrX | 29680185 | 29681712 | | chrX | 49040631 | 49041569 | | chrX | 102911509 | 102912317 | | chrX | 153275683 | 153276467 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,726 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic16': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 207174932 | 207176903 | | chr1 | 201986333 | 201988638 | | chr1 | 181086404 | 181087520 | | chr1 | 181088207 | 181089299 | | ... | ... | ... | | chrX | 152965100 | 152966525 | | chrX | 128979530 | 128980262 | | chrX | 12759422 | 12760005 | | chrX | 112083609 | 112085109 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,821 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic17': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 95107099 | 95108569 | | chr1 | 64808739 | 64810266 | | chr1 | 94167026 | 94168182 | | chr1 | 117184849 | 117186550 | | ... | ... | ... | | chrX | 106045355 | 106046379 | | chrX | 46119847 | 46120427 | | chrX | 55945818 | 55946748 | | chrX | 22264661 | 22265196 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,750 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic18': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 118147617 | 118150425 | | chr1 | 11967662 | 11970199 | | chr1 | 156629521 | 156632021 | | chr1 | 173792932 | 173795064 | | ... | ... | ... | | chrX | 54665641 | 54666702 | | chrX | 49011554 | 49012955 | | chrX | 152109751 | 152110779 | | chrX | 47517880 | 47518995 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,497 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic19': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 78004593 | 78006830 | | chr1 | 183247884 | 183249467 | | chr1 | 201278389 | 201281177 | | chr1 | 71769188 | 71769943 | | ... | ... | ... | | chrX | 54518832 | 54519227 | | chrX | 109411080 | 109412148 | | chrX | 48827844 | 48828824 | | chrX | 151988254 | 151988654 | +--------------+-----------+-----------+ Unstranded PyRanges object has 5,390 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic20': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 113006903 | 113009217 | | chr1 | 84503534 | 84504271 | | chr1 | 197752445 | 197753288 | | chr1 | 155144977 | 155151656 | | ... | ... | ... | | chrX | 106983255 | 106983693 | | chrX | 62654009 | 62654923 | | chrX | 53009601 | 53010033 | | chrX | 134232613 | 134233347 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,678 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome.}, 'DARs': {'treated': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 112198722 | 112199143 | | chr1 | 223564008 | 223564305 | | chr1 | 153019004 | 153019252 | | chr1 | 211987098 | 211987288 | | ... | ... | ... | | chrX | 10812386 | 10812848 | | chrX | 115412378 | 115412710 | | chrX | 55945818 | 55946748 | | chrX | 56253964 | 56254426 | +--------------+-----------+-----------+ Unstranded PyRanges object has 7,202 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'untreated': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 50712759 | 50712942 | | chr1 | 100918257 | 100918507 | | chr1 | 85048919 | 85049240 | | chr1 | 61753362 | 61753709 | | ... | ... | ... | | chrX | 19865019 | 19865385 | | chrX | 24524109 | 24524582 | | chrX | 128811893 | 128812654 | | chrX | 150017336 | 150017897 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,418 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome.}}

SeppeDeWinter commented 4 months ago

Hi @mwetzel7

Not sure what is going on here.

Can you try the following:

# read your feather database 
import pandas as pd
db = pd.read_feather(<PATH_TO_DATABASE>).drop("motifs", axis = 1)

from pycistarget.utils import region_names_to_coordinates

test = region_names_to_coordinates(db.columns)

Does that produce the same error?



mwetzel7 commented 4 months ago

Hi @SeppeDeWinter

Yes, running that code does produce the same error (for both rankings and scores DB):

ValueError                                Traceback (most recent call last)
Cell In[15], [line 9](vscode-notebook-cell:?execution_count=15&line=9)
      [5](vscode-notebook-cell:?execution_count=15&line=5) db = pd.read_feather(rankings_db).drop("motifs", axis = 1)
      [7](vscode-notebook-cell:?execution_count=15&line=7) from pycistarget.utils import region_names_to_coordinates
----> [9](vscode-notebook-cell:?execution_count=15&line=9) test = region_names_to_coordinates(db.columns)

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/](, in region_names_to_coordinates(region_names)
     [29]( chrom=pd.DataFrame([i.split(':', 1)[0] for i in region_names if ':' in i])
     [30]( coor = [i.split(':', 1)[1] for i in region_names if ':' in i]
---> [31]( start=pd.DataFrame([int(i.split('-', 1)[0]) for i in coor])
     [32]( end=pd.DataFrame([int(i.split('-', 1)[1]) for i in coor])
     [33]( regiondf=pd.concat([chrom, start, end], axis=1, sort=False)

File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/](, in <listcomp>(.0)
     [29]( chrom=pd.DataFrame([i.split(':', 1)[0] for i in region_names if ':' in i])
     [30]( coor = [i.split(':', 1)[1] for i in region_names if ':' in i]
---> [31]( start=pd.DataFrame([int(i.split('-', 1)[0]) for i in coor])
     [32]( end=pd.DataFrame([int(i.split('-', 1)[1]) for i in coor])
     [33]( regiondf=pd.concat([chrom, start, end], axis=1, sort=False)

ValueError: invalid literal for int() with base 10: '1.17e+08'
SeppeDeWinter commented 4 months ago

Hi @mwetzel7

Then somehow there is something wrong with the region names in those databases.

You can check which one is the culprit by running:

# read your feather database 
import pandas as pd
db = pd.read_feather(<PATH_TO_DATABASE>).drop("motifs", axis = 1)

from pycistarget.utils import region_names_to_coordinates

for region in db.columns:
        test = region_names_to_coordinates([region])
mwetzel7 commented 4 months ago

Hi @SeppeDeWinter

Using the above code I was able to see one column named incorrectly: chr8:1.17e+08-117000643

I read the full db in and renamed this column to: chr8:117000000-117000643

I seem to be able to get past the original error I was getting now, but I am running into the error that NameError: name 'run_cistarget' is not defined. Above you said it is deprecated and to follow the new tutorials, however, I do not see a tutorial in the link provided that runs pycisTarget. On the pycisTarget page's tutorials it says to still use the run_cistarget wrapper from SCENIC+.

Sorry if I missed it, but how should I run pycisTarget now?

SeppeDeWinter commented 4 months ago

Hi @mwetzel7

Yes, I should still update the pycistarget tutorials. Sorry about that.

If you are running pycistarget in the context of a SCENIC+ analysis, you can follow this tutorial: . In that case the Snakemake pipeline will automatically run pycistarget.

In case you want to run pycistarget on its own. Pycistarget now has a command line interface, see example below on how to use it.

pycistarget cistarget \
  --cistarget_db_fname <PATH_TO_YOUR_DATABASE> \
  --bed_fname <PATH_TO_YOUR_BED_FILE> \
  --output_folder <PATH_TO_OUTPUT_FOLDER> \
  --species <SPECIES_NAME> \

I hope that helps?

All the best,
