Open mwetzel7 opened 5 months ago
Hi @mwetzel7
Indeed this wrapper function is now deprecated, I would suggest to follow the new tutorials on: https://scenicplus.readthedocs.io/en/latest/tutorials.html.
As to your error, can you show how your region_sets
look like?
All the best,
Seppe
Hi Seppe,
Thanks for the updated link.
Here is the output of my region_sets
object:
{'topics': {'Topic1': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 201986333 | 201988638 | | chr1 | 202025595 | 202028101 | | chr1 | 1708993 | 1712275 | | chr1 | 223853125 | 223854294 | | ... | ... | ... | | chrX | 16042126 | 16042993 | | chrX | 3012333 | 3012751 | | chrX | 20159306 | 20160422 | | chrX | 17377623 | 17378030 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,742 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic2': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 209989122 | 209990158 | | chr1 | 161043203 | 161045212 | | chr1 | 167632104 | 167633831 | | chr1 | 109805761 | 109806941 | | ... | ... | ... | | chrX | 106817415 | 106818172 | | chrX | 49686815 | 49687669 | | chrX | 114257621 | 114258236 | | chrX | 103172609 | 103174306 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,428 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic3': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 170095644 | 170096356 | | chr1 | 177178306 | 177178991 | | chr1 | 150539385 | 150547703 | | chr1 | 157938117 | 157940226 | | ... | ... | ... | | chrX | 118107760 | 118111129 | | chrX | 45709848 | 45711508 | | chrX | 39589971 | 39590364 | | chrX | 40856170 | 40856613 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,507 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic4': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 203069483 | 203070178 | | chr1 | 181080723 | 181082733 | | chr1 | 75118075 | 75119037 | | chr1 | 92791818 | 92792769 | | ... | ... | ... | | chrX | 133593737 | 133594720 | | chrX | 13503234 | 13503688 | | chrX | 21824867 | 21825200 | | chrX | 65144901 | 65145415 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,619 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic5': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 168732171 | 168732836 | | chr1 | 152878275 | 152879255 | | chr1 | 175439944 | 175441001 | | chr1 | 97025400 | 97026316 | | ... | ... | ... | | chrX | 9880975 | 9881504 | | chrX | 62780548 | 62781211 | | chrX | 115630870 | 115631435 | | chrX | 47059803 | 47060150 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,702 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic6': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 111019479 | 111020693 | | chr1 | 231761911 | 231764351 | | chr1 | 156783309 | 156785678 | | chr1 | 173159297 | 173160632 | | ... | ... | ... | | chrX | 24068560 | 24069012 | | chrX | 116510328 | 116510898 | | chrX | 23799330 | 23800080 | | chrX | 101914619 | 101915215 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,735 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic7': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 9648373 | 9650268 | | chr1 | 86042110 | 86044683 | | chr1 | 6761479 | 6762421 | | chr1 | 1891134 | 1891915 | | ... | ... | ... | | chrX | 153941048 | 153941760 | | chrX | 122866614 | 122867192 | | chrX | 48858439 | 48859194 | | chrX | 40594391 | 40595602 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,800 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic8': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 7022237 | 7023848 | | chr1 | 45284563 | 45286678 | | chr1 | 175138133 | 175139160 | | chr1 | 230818904 | 230820453 | | ... | ... | ... | | chrX | 17626962 | 17627475 | | chrX | 43365026 | 43365394 | | chrX | 9504216 | 9504473 | | chrX | 128494140 | 128494570 | +--------------+-----------+-----------+ Unstranded PyRanges object has 7,684 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic9': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 198624686 | 198627472 | | chr1 | 56184332 | 56185091 | | chr1 | 228996555 | 228997478 | | chr1 | 147016695 | 147017409 | | ... | ... | ... | | chrX | 17423245 | 17424040 | | chrX | 71249709 | 71250540 | | chrX | 122221208 | 122221661 | | chrX | 56771419 | 56771830 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,849 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic10': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 167416630 | 167417442 | | chr1 | 115211696 | 115214930 | | chr1 | 205894584 | 205895783 | | chr1 | 244504701 | 244505629 | | ... | ... | ... | | chrX | 128979530 | 128980262 | | chrX | 24524109 | 24524582 | | chrX | 77631331 | 77631947 | | chrX | 129095071 | 129095677 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,178 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic11': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 209779814 | 209787548 | | chr1 | 21620355 | 21621875 | | chr1 | 225887664 | 225888597 | | chr1 | 183149634 | 183150592 | | ... | ... | ... | | chrX | 73755176 | 73756869 | | chrX | 16042126 | 16042993 | | chrX | 39754585 | 39755340 | | chrX | 19192737 | 19193156 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,240 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic12': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 85100087 | 85101090 | | chr1 | 147229238 | 147230259 | | chr1 | 75118075 | 75119037 | | chr1 | 201425495 | 201426631 | | ... | ... | ... | | chrX | 152965100 | 152966525 | | chrX | 46630167 | 46630620 | | chrX | 54466448 | 54467418 | | chrX | 153058811 | 153060639 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,727 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic13': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 15850334 | 15853944 | | chr1 | 59245323 | 59251724 | | chr1 | 154942503 | 154948437 | | chr1 | 212779946 | 212783210 | | ... | ... | ... | | chrX | 152973757 | 152974461 | | chrX | 118986485 | 118987412 | | chrX | 149369046 | 149369708 | | chrX | 152863486 | 152865163 | +--------------+-----------+-----------+ Unstranded PyRanges object has 1,926 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic14': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 149856121 | 149860717 | | chr1 | 113931133 | 113934939 | | chr1 | 110880229 | 110883171 | | chr1 | 114446833 | 114448476 | | ... | ... | ... | | chrX | 38662680 | 38665543 | | chrX | 23825231 | 23826259 | | chrX | 1571450 | 1573499 | | chrX | 153235448 | 153238827 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,382 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic15': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 115721778 | 115722952 | | chr1 | 159046137 | 159047578 | | chr1 | 86072413 | 86073567 | | chr1 | 117079736 | 117081832 | | ... | ... | ... | | chrX | 29680185 | 29681712 | | chrX | 49040631 | 49041569 | | chrX | 102911509 | 102912317 | | chrX | 153275683 | 153276467 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,726 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic16': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 207174932 | 207176903 | | chr1 | 201986333 | 201988638 | | chr1 | 181086404 | 181087520 | | chr1 | 181088207 | 181089299 | | ... | ... | ... | | chrX | 152965100 | 152966525 | | chrX | 128979530 | 128980262 | | chrX | 12759422 | 12760005 | | chrX | 112083609 | 112085109 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,821 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic17': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 95107099 | 95108569 | | chr1 | 64808739 | 64810266 | | chr1 | 94167026 | 94168182 | | chr1 | 117184849 | 117186550 | | ... | ... | ... | | chrX | 106045355 | 106046379 | | chrX | 46119847 | 46120427 | | chrX | 55945818 | 55946748 | | chrX | 22264661 | 22265196 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,750 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic18': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 118147617 | 118150425 | | chr1 | 11967662 | 11970199 | | chr1 | 156629521 | 156632021 | | chr1 | 173792932 | 173795064 | | ... | ... | ... | | chrX | 54665641 | 54666702 | | chrX | 49011554 | 49012955 | | chrX | 152109751 | 152110779 | | chrX | 47517880 | 47518995 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2,497 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic19': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 78004593 | 78006830 | | chr1 | 183247884 | 183249467 | | chr1 | 201278389 | 201281177 | | chr1 | 71769188 | 71769943 | | ... | ... | ... | | chrX | 54518832 | 54519227 | | chrX | 109411080 | 109412148 | | chrX | 48827844 | 48828824 | | chrX | 151988254 | 151988654 | +--------------+-----------+-----------+ Unstranded PyRanges object has 5,390 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'Topic20': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 113006903 | 113009217 | | chr1 | 84503534 | 84504271 | | chr1 | 197752445 | 197753288 | | chr1 | 155144977 | 155151656 | | ... | ... | ... | | chrX | 106983255 | 106983693 | | chrX | 62654009 | 62654923 | | chrX | 53009601 | 53010033 | | chrX | 134232613 | 134233347 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,678 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome.}, 'DARs': {'treated': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 112198722 | 112199143 | | chr1 | 223564008 | 223564305 | | chr1 | 153019004 | 153019252 | | chr1 | 211987098 | 211987288 | | ... | ... | ... | | chrX | 10812386 | 10812848 | | chrX | 115412378 | 115412710 | | chrX | 55945818 | 55946748 | | chrX | 56253964 | 56254426 | +--------------+-----------+-----------+ Unstranded PyRanges object has 7,202 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome., 'untreated': +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int32) | (int32) | |--------------+-----------+-----------| | chr1 | 50712759 | 50712942 | | chr1 | 100918257 | 100918507 | | chr1 | 85048919 | 85049240 | | chr1 | 61753362 | 61753709 | | ... | ... | ... | | chrX | 19865019 | 19865385 | | chrX | 24524109 | 24524582 | | chrX | 128811893 | 128812654 | | chrX | 150017336 | 150017897 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4,418 rows and 3 columns from 23 chromosomes. For printing, the PyRanges was sorted on Chromosome.}}
Hi @mwetzel7
Not sure what is going on here.
Can you try the following:
# read your feather database
import pandas as pd
db = pd.read_feather(<PATH_TO_DATABASE>).drop("motifs", axis = 1)
from pycistarget.utils import region_names_to_coordinates
test = region_names_to_coordinates(db.columns)
Does that produce the same error?
Best,
Seppe
Hi @SeppeDeWinter
Yes, running that code does produce the same error (for both rankings and scores DB):
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[15], [line 9](vscode-notebook-cell:?execution_count=15&line=9)
[5](vscode-notebook-cell:?execution_count=15&line=5) db = pd.read_feather(rankings_db).drop("motifs", axis = 1)
[7](vscode-notebook-cell:?execution_count=15&line=7) from pycistarget.utils import region_names_to_coordinates
----> [9](vscode-notebook-cell:?execution_count=15&line=9) test = region_names_to_coordinates(db.columns)
File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31), in region_names_to_coordinates(region_names)
[29](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:29) chrom=pd.DataFrame([i.split(':', 1)[0] for i in region_names if ':' in i])
[30](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:30) coor = [i.split(':', 1)[1] for i in region_names if ':' in i]
---> [31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31) start=pd.DataFrame([int(i.split('-', 1)[0]) for i in coor])
[32](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:32) end=pd.DataFrame([int(i.split('-', 1)[1]) for i in coor])
[33](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:33) regiondf=pd.concat([chrom, start, end], axis=1, sort=False)
File [~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31), in <listcomp>(.0)
[29](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:29) chrom=pd.DataFrame([i.split(':', 1)[0] for i in region_names if ':' in i])
[30](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:30) coor = [i.split(':', 1)[1] for i in region_names if ':' in i]
---> [31](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:31) start=pd.DataFrame([int(i.split('-', 1)[0]) for i in coor])
[32](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:32) end=pd.DataFrame([int(i.split('-', 1)[1]) for i in coor])
[33](https://file+.vscode-resource.vscode-cdn.net/Users/meredithwetzel/Library/CloudStorage/OneDrive-JohnsHopkins/JHU_MDACC_HDACi/pycisTopic/~/anaconda3/envs/scenicplus/lib/python3.11/site-packages/pycistarget/utils.py:33) regiondf=pd.concat([chrom, start, end], axis=1, sort=False)
ValueError: invalid literal for int() with base 10: '1.17e+08'
Hi @mwetzel7
Then somehow there is something wrong with the region names in those databases.
You can check which one is the culprit by running:
# read your feather database
import pandas as pd
db = pd.read_feather(<PATH_TO_DATABASE>).drop("motifs", axis = 1)
from pycistarget.utils import region_names_to_coordinates
for region in db.columns:
try:
test = region_names_to_coordinates([region])
except:
print(region)
Hi @SeppeDeWinter
Using the above code I was able to see one column named incorrectly: chr8:1.17e+08-117000643
I read the full db in and renamed this column to: chr8:117000000-117000643
I seem to be able to get past the original error I was getting now, but I am running into the error that NameError: name 'run_cistarget' is not defined
. Above you said it is deprecated and to follow the new tutorials, however, I do not see a tutorial in the link provided that runs pycisTarget. On the pycisTarget page's tutorials it says to still use the run_cistarget
wrapper from SCENIC+.
Sorry if I missed it, but how should I run pycisTarget now?
Hi @mwetzel7
Yes, I should still update the pycistarget tutorials. Sorry about that.
If you are running pycistarget in the context of a SCENIC+ analysis, you can follow this tutorial: https://scenicplus.readthedocs.io/en/latest/human_cerebellum.html . In that case the Snakemake pipeline will automatically run pycistarget.
In case you want to run pycistarget on its own. Pycistarget now has a command line interface, see example below on how to use it.
pycistarget cistarget \
--cistarget_db_fname <PATH_TO_YOUR_DATABASE> \
--bed_fname <PATH_TO_YOUR_BED_FILE> \
--output_folder <PATH_TO_OUTPUT_FOLDER> \
--species <SPECIES_NAME> \
--write_html
I hope that helps?
All the best,
Seppe
Describe the bug When trying to run "run_pycistarget" I get this error:
ValueError: invalid literal for int() with base 10: '1.17e+08'
To Reproduce I was following the 10x multiome tutorial, and trying to run the "run_pycistarget" wrapper from "scenicplus.wrappers.run_pycistarget". That doesn't seem available anymore, but I also tried with the cisTarget for SCENIC+ tutorial here: https://pycistarget.readthedocs.io/en/latest/pycistarget_scenic%2B_wrapper.html and got the same error. For inputs, I had created my own cisTarget databases (as my data were aligned to hg19) by following the provided tutorials to do so.
Error output
Screenshots Here are screenshots of my custom cisTarget feather DBs after reading them in with "pandas.read_feather" (the first few and last few columns of the scores and rankings):
Version (please complete the following information):
I'm not sure why I'm getting this error and if it's from my custom DB or something else.
Thank you for your help!