Open sid5427 opened 1 year ago
Hi @sid5427
From the error I suspect that region_sets['DARs']
might be empty or contain empty entries.
Could you show the output of region_sets['DARs']
to confirm this?
On your question wether it is possible to run SCENIC+ with a couple of the partial result. This is possible, you can generate the menr dictionary like this (in your case):
import dill
CTX_topics_otsu_All = dill.load(open('results/motifs/CTX_topics_otsu_All.pkl', 'rb'))
DEM_topics_otsu_All = dill.load(open('results/motifs/DEM_topics_otsu_All.pkl', 'rb'))
CTX_topics_top_3_All = dill.load(open('results/motifs/CTX_topics_top_3_All.pkl', 'rb'))
DEM_topics_top_3_All = dill.load(open('results/motifs/DEM_topics_top_3_All.pkl', 'rb'))
menr['CTX_topics_otsu_All'] = CTX_topics_otsu_All
menr['DEM_topics_otsu_All'] = DEM_topics_otsu_All
menr['CTX_topics_top_3_All'] = CTX_topics_top_3_All
menr['DEM_topics_top_3_All'] = DEM_topics_top_3_All
Best,
Seppe
Hi Seppe,
That's the weird part - when I run the code section for finding DARs in markers_dict
for DAR in markers_dict.keys():
regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
#print(regions)
region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))
print("pr.PyRanges(region_names_to_coordinates(regions))")
I get this error -
pr.PyRanges(region_names_to_coordinates(regions))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[14], line 4
2 regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
3 #print(regions)
----> 4 region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))
5 print("pr.PyRanges(region_names_to_coordinates(regions))")
File ~/testing_area/pycistarget/pycistarget/utils.py:33, in region_names_to_coordinates(region_names)
31 regiondf=pd.concat([chrom, start, end], axis=1, sort=False)
32 regiondf.index=[i for i in region_names if ':' in i]
---> 33 regiondf.columns=['Chromosome', 'Start', 'End']
34 return(regiondf)
File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/generic.py:5915, in NDFrame.__setattr__(self, name, value)
5913 try:
5914 object.__getattribute__(self, name)
-> 5915 return object.__setattr__(self, name, value)
5916 except AttributeError:
5917 pass
File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/_libs/properties.pyx:69, in pandas._libs.properties.AxisProperty.__set__()
File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/generic.py:823, in NDFrame._set_axis(self, axis, labels)
821 def _set_axis(self, axis: int, labels: AnyArrayLike | list) -> None:
822 labels = ensure_index(labels)
--> 823 self._mgr.set_axis(axis, labels)
824 self._clear_item_cache()
File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/internals/managers.py:227, in BaseBlockManager.set_axis(self, axis, new_labels)
225 def set_axis(self, axis: int, new_labels: Index) -> None:
226 # Caller is responsible for ensuring we have an Index object.
--> 227 self._validate_set_axis(axis, new_labels)
228 self.axes[axis] = new_labels
File ~/.conda/envs/py_3_8/lib/python3.8/site-packages/pandas/core/internals/base.py:70, in DataManager._validate_set_axis(self, axis, new_labels)
67 pass
69 elif new_len != old_len:
---> 70 raise ValueError(
71 f"Length mismatch: Expected axis has {old_len} elements, new "
72 f"values have {new_len} elements"
73 )
ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements
However if I run region_sets['DARs']
after that - I get this -
{'BMCP': +--------------+-----------+-----------+
| Chromosome | Start | End |
| (category) | (int32) | (int32) |
|--------------+-----------+-----------|
| chr1 | 21353367 | 21353867 |
| chr1 | 27542533 | 27543033 |
| chr1 | 147377812 | 147378312 |
| chr1 | 186195347 | 186195847 |
| ... | ... | ... |
| chrX | 130179564 | 130180064 |
| chrX | 129957183 | 129957683 |
| chrX | 109848273 | 109848773 |
| chrX | 41257752 | 41258252 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3,635 rows and 3 columns from 23 chromosomes.
For printing, the PyRanges was sorted on Chromosome.}
I went ahead and printed the output of print(markers_dict)
and this what I get - looks like scenic does not detect markers for certain cell types (i.e. the result from markers_dict = find_diff_features(cistopic_obj, imputed_acc_obj, variable='celltype', var_features=variable_regions, split_pattern = '-')
<-- this complete successfully though...)
{'BMCP': Log2FC Adjusted_pval Contrast
chr8:73520503-73521003 4.247113 0.0 BMCP
chr1:21353367-21353867 4.242884 0.0 BMCP
chr11:44780868-44781368 4.219286 0.0 BMCP
chr13:44397846-44398346 4.167223 0.0 BMCP
chr1:27542533-27543033 4.164805 0.0 BMCP
... ... ... ...
chr22:38768855-38769355 0.586214 0.0 BMCP
chr7:15977629-15978129 0.585935 0.0 BMCP
chr5:88884545-88885045 0.585763 0.0 BMCP
chr5:150129881-150130381 0.585325 0.0 BMCP
chr3:195853922-195854422 0.58524 0.0 BMCP
[3636 rows x 3 columns], 'CD14-Mono': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [], 'CD34 Gran-ATAC': Log2FC Adjusted_pval Contrast
chr12:2294663-2295163 4.291807 0.0 CD34 Gran-ATAC
chrX:15911664-15912164 4.25781 0.0 CD34 Gran-ATAC
chr3:128523503-128524003 4.191673 0.0 CD34 Gran-ATAC
chr9:129058106-129058606 4.185681 0.0 CD34 Gran-ATAC
chr13:28468022-28468522 4.104304 0.0 CD34 Gran-ATAC
... ... ... ...
chr9:127967948-127968448 0.585811 0.000347 CD34 Gran-ATAC
chrX:56813483-56813983 0.585558 0.000001 CD34 Gran-ATAC
chr14:35286835-35287335 0.585539 0.000006 CD34 Gran-ATAC
chr11:72787723-72788223 0.585364 0.000001 CD34 Gran-ATAC
chr2:20424749-20425249 0.58513 0.0 CD34 Gran-ATAC
[3357 rows x 3 columns], 'CLP': Log2FC Adjusted_pval Contrast
chr2:231672428-231672928 2.806754 0.001634 CLP
chr16:29339242-29339742 2.58527 0.00002 CLP
chr7:2723132-2723632 2.58264 0.00002 CLP
chr22:22935612-22936112 2.513794 0.00002 CLP
chr12:110647657-110648157 2.494059 0.00002 CLP
... ... ... ...
chr19:47336760-47337260 0.58605 0.01005 CLP
chr2:101474036-101474536 0.58595 0.000616 CLP
chr11:104653394-104653894 0.585924 0.022442 CLP
chr12:89663417-89663917 0.585352 0.045297 CLP
chr6:150142812-150143312 0.585031 0.000074 CLP
[4583 rows x 3 columns], 'ERP': Log2FC Adjusted_pval Contrast
chr1:21353367-21353867 4.939022 0.0 ERP
chr8:73520503-73521003 4.93843 0.0 ERP
chr11:44780868-44781368 4.91324 0.0 ERP
chr13:44397846-44398346 4.868945 0.0 ERP
chr1:27542533-27543033 4.864803 0.0 ERP
... ... ... ...
chr1:179586214-179586714 0.586495 0.0 ERP
chr17:36482888-36483388 0.586426 0.0 ERP
chr12:79882960-79883460 0.586234 0.000066 ERP
chr21:34201754-34202254 0.586081 0.0 ERP
chr7:138400431-138400931 0.585121 0.000039 ERP
[3377 rows x 3 columns], 'HSC CACNB2': Log2FC Adjusted_pval Contrast
chr17:13347389-13347889 0.914718 0.0 HSC CACNB2
chr3:152977353-152977853 0.900724 0.0 HSC CACNB2
chrX:72269788-72270288 0.897633 0.0 HSC CACNB2
chr5:119274871-119275371 0.897493 0.0 HSC CACNB2
chr4:155543343-155543843 0.897219 0.0 HSC CACNB2
... ... ... ...
chr15:34855325-34855825 0.585576 0.0 HSC CACNB2
chr15:52483673-52484173 0.585511 0.0 HSC CACNB2
chr10:71776295-71776795 0.585307 0.0 HSC CACNB2
chr2:15640184-15640684 0.585199 0.0 HSC CACNB2
chr21:45364168-45364668 0.585058 0.0 HSC CACNB2
[658 rows x 3 columns], 'HSC HIST1H2AC': Log2FC Adjusted_pval Contrast
chrX:13470339-13470839 1.188423 0.0 HSC HIST1H2AC
chr2:108288396-108288896 1.17873 0.0 HSC HIST1H2AC
chrX:98398402-98398902 1.174786 0.0 HSC HIST1H2AC
chr5:45507129-45507629 1.171706 0.0 HSC HIST1H2AC
chr6:170506315-170506815 1.168462 0.0 HSC HIST1H2AC
... ... ... ...
chr8:28347974-28348474 0.585566 0.0 HSC HIST1H2AC
chr12:60144359-60144859 0.585439 0.0 HSC HIST1H2AC
chr8:144292302-144292802 0.585366 0.0 HSC HIST1H2AC
chr17:61252140-61252640 0.58524 0.0 HSC HIST1H2AC
chr14:92586155-92586655 0.585017 0.0 HSC HIST1H2AC
[2866 rows x 3 columns], 'HSC MYADM-CD97': Log2FC Adjusted_pval Contrast
chr17:13347389-13347889 1.627802 0.0 HSC MYADM-CD97
chrX:72269788-72270288 1.593157 0.0 HSC MYADM-CD97
chr1:52139793-52140293 1.588745 0.0 HSC MYADM-CD97
chr1:207282009-207282509 1.573321 0.0 HSC MYADM-CD97
chr12:81668404-81668904 1.564987 0.0 HSC MYADM-CD97
... ... ... ...
chr9:121351196-121351696 0.585478 0.0 HSC MYADM-CD97
chr18:70781827-70782327 0.585219 0.0 HSC MYADM-CD97
chr2:20727942-20728442 0.585111 0.0 HSC MYADM-CD97
chr12:52259754-52260254 0.585101 0.0 HSC MYADM-CD97
chr17:63676658-63677158 0.585022 0.0 HSC MYADM-CD97
[2861 rows x 3 columns], 'HSC WNT11': Log2FC Adjusted_pval Contrast
chr20:34583552-34584052 0.897686 0.0 HSC WNT11
chr1:157465990-157466490 0.857838 0.0 HSC WNT11
chr17:69703839-69704339 0.854879 0.0 HSC WNT11
chr16:86415293-86415793 0.836379 0.0 HSC WNT11
chr9:99833712-99834212 0.836237 0.0 HSC WNT11
... ... ... ...
chr5:45507129-45507629 0.585922 0.0 HSC WNT11
chr18:11652338-11652838 0.585842 0.0 HSC WNT11
chr2:46866827-46867327 0.585681 0.0 HSC WNT11
chr2:219182404-219182904 0.585369 0.0 HSC WNT11
chrX:112889319-112889819 0.585211 0.0 HSC WNT11
[232 rows x 3 columns], 'LMPP CDK6-FLT3': Log2FC Adjusted_pval Contrast
chr17:60402734-60403234 1.689114 0.0 LMPP CDK6-FLT3
chr5:157867743-157868243 1.673037 0.0 LMPP CDK6-FLT3
chr6:119523493-119523993 1.670789 0.0 LMPP CDK6-FLT3
chr3:139202264-139202764 1.666878 0.0 LMPP CDK6-FLT3
chr5:98922016-98922516 1.653112 0.0 LMPP CDK6-FLT3
... ... ... ...
chr3:46926890-46927390 0.585351 0.0 LMPP CDK6-FLT3
chr19:41378266-41378766 0.585168 0.0 LMPP CDK6-FLT3
chr8:101314062-101314562 0.58502 0.0 LMPP CDK6-FLT3
chr22:29079289-29079789 0.584999 0.0 LMPP CDK6-FLT3
chr11:123484430-123484930 0.584964 0.0 LMPP CDK6-FLT3
[4573 rows x 3 columns], 'LMPP LSAMP': Log2FC Adjusted_pval Contrast
chr19:28388949-28389449 2.962269 0.0 LMPP LSAMP
chr2:108776418-108776918 2.960341 0.0 LMPP LSAMP
chr5:158825148-158825648 2.957966 0.0 LMPP LSAMP
chr10:33715201-33715701 2.956146 0.0 LMPP LSAMP
chr3:29030843-29031343 2.953672 0.0 LMPP LSAMP
... ... ... ...
chr13:41916484-41916984 0.585829 0.0 LMPP LSAMP
chr3:122271735-122272235 0.585776 0.0 LMPP LSAMP
chr20:19943058-19943558 0.585691 0.0 LMPP LSAMP
chr11:44611700-44612200 0.585415 0.0 LMPP LSAMP
chr15:38817713-38818213 0.58509 0.0 LMPP LSAMP
[5472 rows x 3 columns], 'LMPP Naive T-cell': Log2FC Adjusted_pval Contrast
chr2:231672428-231672928 5.046681 0.0 LMPP Naive T-cell
chr17:57552218-57552718 4.440933 0.0 LMPP Naive T-cell
chr2:234164455-234164955 4.180235 0.0 LMPP Naive T-cell
chr22:44025612-44026112 4.119872 0.0 LMPP Naive T-cell
chr11:65639492-65639992 4.04134 0.0 LMPP Naive T-cell
... ... ... ...
chr6:24666804-24667304 0.586347 0.0 LMPP Naive T-cell
chr22:48098172-48098672 0.586282 0.0 LMPP Naive T-cell
chr7:111408651-111409151 0.585472 0.000576 LMPP Naive T-cell
chr9:99129969-99130469 0.585327 0.000062 LMPP Naive T-cell
chr2:127829811-127830311 0.584968 0.000556 LMPP Naive T-cell
[1764 rows x 3 columns], 'LMPP PRSS1': Log2FC Adjusted_pval Contrast
chr10:33715201-33715701 2.414107 0.0 LMPP PRSS1
chr3:29030843-29031343 2.412636 0.0 LMPP PRSS1
chr21:38525418-38525918 2.412636 0.0 LMPP PRSS1
chr19:28388949-28389449 2.411385 0.0 LMPP PRSS1
chr20:53686072-53686572 2.410341 0.0 LMPP PRSS1
... ... ... ...
chr1:212596123-212596623 0.585835 0.0 LMPP PRSS1
chr19:19451403-19451903 0.585529 0.0 LMPP PRSS1
chr1:43836421-43836921 0.585337 0.0 LMPP PRSS1
chr6:31351814-31352314 0.585109 0.0 LMPP PRSS1
chr3:69092014-69092514 0.585054 0.0 LMPP PRSS1
[5788 rows x 3 columns], 'LT-HSC HLF': Log2FC Adjusted_pval Contrast
chr5:45507129-45507629 1.531193 0.0 LT-HSC HLF
chr6:170506315-170506815 1.528928 0.0 LT-HSC HLF
chr10:13133034-13133534 1.509826 0.0 LT-HSC HLF
chr22:37613498-37613998 1.505232 0.0 LT-HSC HLF
chr12:103540329-103540829 1.502772 0.0 LT-HSC HLF
... ... ... ...
chr14:24313086-24313586 0.585542 0.0 LT-HSC HLF
chr20:18589905-18590405 0.585467 0.0 LT-HSC HLF
chr17:75864253-75864753 0.585419 0.0 LT-HSC HLF
chr12:66776806-66777306 0.585236 0.0 LT-HSC HLF
chr8:109745381-109745881 0.585025 0.0 LT-HSC HLF
[3537 rows x 3 columns], 'MDP-2 GPR133': Log2FC Adjusted_pval Contrast
chr2:231672428-231672928 2.663041 0.00004 MDP-2 GPR133
chr16:29339242-29339742 2.645864 0.000032 MDP-2 GPR133
chr7:2723132-2723632 2.619371 0.000032 MDP-2 GPR133
chr12:110647657-110648157 2.560181 0.000032 MDP-2 GPR133
chr22:22935612-22936112 2.539039 0.000032 MDP-2 GPR133
... ... ... ...
chr19:3849137-3849637 0.585687 0.023213 MDP-2 GPR133
chr10:74058584-74059084 0.585513 0.000168 MDP-2 GPR133
chr3:184492115-184492615 0.585447 0.001824 MDP-2 GPR133
chrX:114268580-114269080 0.585185 0.002259 MDP-2 GPR133
chr7:139811046-139811546 0.585112 0.000575 MDP-2 GPR133
[4048 rows x 3 columns], 'MDP-pDC': Log2FC Adjusted_pval Contrast
chr2:231672428-231672928 5.797586 0.0 MDP-pDC
chr17:57552218-57552718 5.176988 0.0 MDP-pDC
chr2:234164455-234164955 4.96483 0.0 MDP-pDC
chr22:44025612-44026112 4.882341 0.0 MDP-pDC
chr11:65639492-65639992 4.872473 0.0 MDP-pDC
... ... ... ...
chr19:3324341-3324841 0.585875 0.000001 MDP-pDC
chr8:38830617-38831117 0.58583 0.000004 MDP-pDC
chr13:98484715-98485215 0.585555 0.006703 MDP-pDC
chr12:120437801-120438301 0.585127 0.000002 MDP-pDC
chr12:132558281-132558781 0.585089 0.000273 MDP-pDC
[4951 rows x 3 columns], 'MEP-MKP': Log2FC Adjusted_pval Contrast
chr8:73520503-73521003 3.73497 0.0 MEP-MKP
chr1:21353367-21353867 3.70901 0.0 MEP-MKP
chr11:44780868-44781368 3.69003 0.0 MEP-MKP
chr1:27542533-27543033 3.619486 0.0 MEP-MKP
chr13:44397846-44398346 3.613465 0.0 MEP-MKP
... ... ... ...
chr1:31162607-31163107 0.586068 0.0 MEP-MKP
chr7:94395369-94395869 0.585895 0.0 MEP-MKP
chr4:74134146-74134646 0.585226 0.0 MEP-MKP
chr17:29117081-29117581 0.585171 0.0 MEP-MKP
chr19:41367776-41368276 0.584975 0.0 MEP-MKP
[3786 rows x 3 columns], 'ML-Gran': Log2FC Adjusted_pval Contrast
chr2:239814585-239815085 1.545085 0.0 ML-Gran
chr7:2214382-2214882 1.53738 0.0 ML-Gran
chr10:11901863-11902363 1.43475 0.0 ML-Gran
chr6:5162341-5162841 1.428097 0.0 ML-Gran
chr4:6888842-6889342 1.397363 0.0 ML-Gran
... ... ... ...
chr22:35602121-35602621 0.585847 0.0 ML-Gran
chr11:93718133-93718633 0.58579 0.0 ML-Gran
chr16:84913249-84913749 0.585337 0.0 ML-Gran
chr11:1152253-1152753 0.585231 0.0 ML-Gran
chr2:88858054-88858554 0.585031 0.0 ML-Gran
[1035 rows x 3 columns], 'MPP Ribo-high': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [], 'MPP SPINK2-CD99': Empty DataFrame
Columns: [Log2FC, Adjusted_pval, Contrast]
Index: [], 'MultiLin-ATAC': Log2FC Adjusted_pval Contrast
chr12:2294663-2295163 1.908098 0.0 MultiLin-ATAC
chrX:15911664-15912164 1.876974 0.0 MultiLin-ATAC
chr3:128523503-128524003 1.864965 0.0 MultiLin-ATAC
chr9:129058106-129058606 1.818677 0.0 MultiLin-ATAC
chr14:59361084-59361584 1.807945 0.0 MultiLin-ATAC
... ... ... ...
chr8:100577336-100577836 0.585536 0.0 MultiLin-ATAC
chr17:82334744-82335244 0.585499 0.0 MultiLin-ATAC
chr10:30049091-30049591 0.585449 0.0 MultiLin-ATAC
chr15:89820641-89821141 0.58526 0.0 MultiLin-ATAC
chr16:68284770-68285270 0.585148 0.0 MultiLin-ATAC
[2195 rows x 3 columns], 'ST-HSC PBX1': Log2FC Adjusted_pval Contrast
chr16:59048819-59049319 0.69066 0.0 ST-HSC PBX1
chr1:100628676-100629176 0.689165 0.0 ST-HSC PBX1
chr2:195047452-195047952 0.686393 0.0 ST-HSC PBX1
chr9:3825541-3826041 0.686267 0.0 ST-HSC PBX1
chr15:35595622-35596122 0.685384 0.0 ST-HSC PBX1
... ... ... ...
chr4:21041593-21042093 0.585442 0.0 ST-HSC PBX1
chr1:209925957-209926457 0.585398 0.0 ST-HSC PBX1
chr1:169465253-169465753 0.585347 0.0 ST-HSC PBX1
chr13:98454535-98455035 0.585183 0.0 ST-HSC PBX1
chr18:36168255-36168755 0.585096 0.0 ST-HSC PBX1
[1022 rows x 3 columns], 'pre-Gran CP': Log2FC Adjusted_pval Contrast
chr12:2294663-2295163 3.18217 0.0 pre-Gran CP
chrX:15911664-15912164 3.137117 0.0 pre-Gran CP
chr3:128523503-128524003 3.08506 0.0 pre-Gran CP
chr9:129058106-129058606 3.076241 0.0 pre-Gran CP
chr13:28468022-28468522 3.014072 0.0 pre-Gran CP
... ... ... ...
chr6:117547619-117548119 0.585609 0.0 pre-Gran CP
chr4:146243417-146243917 0.585519 0.0 pre-Gran CP
chr3:50626386-50626886 0.585345 0.0 pre-Gran CP
chr18:73768053-73768553 0.585049 0.0 pre-Gran CP
chr6:10603219-10603719 0.584966 0.0 pre-Gran CP
[3673 rows x 3 columns], 'pre-MEP': Log2FC Adjusted_pval Contrast
chr10:71980251-71980751 1.818789 0.0 pre-MEP
chr10:12328693-12329193 1.813141 0.0 pre-MEP
chr3:189890471-189890971 1.809711 0.0 pre-MEP
chr14:29650519-29651019 1.807064 0.0 pre-MEP
chr9:591201-591701 1.798752 0.0 pre-MEP
... ... ... ...
chr8:84626097-84626597 0.58563 0.0 pre-MEP
chr2:126368715-126369215 0.58541 0.0 pre-MEP
chr6:87723209-87723709 0.585265 0.0 pre-MEP
chr16:19130706-19131206 0.585088 0.0 pre-MEP
chr11:32056360-32056860 0.585063 0.0 pre-MEP
[3227 rows x 3 columns], 'pre-PC': Log2FC Adjusted_pval Contrast
chr2:231672428-231672928 6.236564 0.0 pre-PC
chr17:57552218-57552718 5.575458 0.0 pre-PC
chr2:234164455-234164955 5.337587 0.0 pre-PC
chr22:44025612-44026112 5.195669 0.0 pre-PC
chr11:65639492-65639992 5.120273 0.0 pre-PC
... ... ... ...
chr12:11650900-11651400 0.588236 0.000293 pre-PC
chr13:30114859-30115359 0.586593 0.0 pre-PC
chr9:129459108-129459608 0.586559 0.0 pre-PC
chr1:92485186-92485686 0.586446 0.0 pre-PC
chr19:41530845-41531345 0.586325 0.0 pre-PC
[2235 rows x 3 columns]}
Hi @sid5427
Yes indeed, it's these empty dataframes in markers_dict
that is causing the error (i.e. 'CD14-Mono', 'MPP Ribo-high' and 'MPP SPINK2-CD99').
You should remove this prior to running:
for DAR in markers_dict.keys():
regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
#print(regions)
region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))
You can also do it like this
for DAR in markers_dict.keys():
regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] #only keep regions on known chromosomes
if len(regions) > 0:
region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))
The reason that these dataframes are empty is because no regions passed the thresholds (i.e. log 2 Fold Change of 1.5 and adjusted p value < 0.05, by default). You can also change these thresholds in find_diff_features
function to get more regions.
Best,
Seppe
Hi Seppe,
Thanks for the solution - I'll incorporate that into my run. I had tried this to remove the three troublesome clusters -
##remove clusters CD14-Mono, MPP Ribo-high, MPP SPINK2-CD99
adata_filtered = adata[adata.obs['cell_type'] != 'MPP Ribo-high' ] #MPP Ribo-high
adata_filtered = adata_filtered[adata_filtered.obs['cell_type'] != 'CD14-Mono' ] #CD14-Mono
adata_filtered = adata_filtered[adata_filtered.obs['cell_type'] != 'MPP SPINK2-CD99' ] #MPP SPINK2-CD99
adata_filtered.obs.cell_type
adata = adata_filtered ##replace original adata with filtered one
del(adata_filtered)
This did work, and it generated a scenicplus object with some of the downstream figures. However I get an error later for this part -
from scenicplus.cistromes import TF_cistrome_correlation, generate_pseudobulks
generate_pseudobulks(
scplus_obj = scplus_obj,
variable = 'GEX_cell_type',
auc_key = 'eRegulon_AUC_filtered',
signature_key = 'Gene_based')
generate_pseudobulks(
scplus_obj = scplus_obj,
variable = 'GEX_cell_type',
auc_key = 'eRegulon_AUC_filtered',
signature_key = 'Region_based')
TF_cistrome_correlation(
scplus_obj,
use_pseudobulk = True,
variable = 'GEX_cell_type',
auc_key = 'eRegulon_AUC_filtered',
signature_key = 'Gene_based',
out_key = 'filtered_gene_based')
TF_cistrome_correlation(
scplus_obj,
use_pseudobulk = True,
variable = 'GEX_cell_type',
auc_key = 'eRegulon_AUC_filtered',
signature_key = 'Region_based',
out_key = 'filtered_region_based')
and this is the error -
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[19], line 3
1 from scenicplus.cistromes import TF_cistrome_correlation, generate_pseudobulks
----> 3 generate_pseudobulks(
4 scplus_obj = scplus_obj,
5 variable = 'GEX_cell_type',
6 auc_key = 'eRegulon_AUC_filtered',
7 signature_key = 'Gene_based')
8 generate_pseudobulks(
9 scplus_obj = scplus_obj,
10 variable = 'GEX_cell_type',
11 auc_key = 'eRegulon_AUC_filtered',
12 signature_key = 'Region_based')
14 TF_cistrome_correlation(
15 scplus_obj,
16 use_pseudobulk = True,
(...)
19 signature_key = 'Gene_based',
20 out_key = 'filtered_gene_based')
File ~/testing_area/scenicplus/src/scenicplus/cistromes.py:227, in generate_pseudobulks(scplus_obj, variable, normalize_expression, auc_key, signature_key, nr_cells, nr_pseudobulks, seed)
225 for x in range(nr_pseudobulks):
226 random.seed(x)
--> 227 sample_cells = sample(cells, nr_cells)
228 sub_dgem = dgem.loc[sample_cells, :].mean(axis=0)
229 sub_auc = cistromes_auc.loc[sample_cells, :].mean(axis=0)
File ~/.conda/envs/py_3_8/lib/python3.8/random.py:363, in Random.sample(self, population, k)
361 n = len(population)
362 if not 0 <= k <= n:
--> 363 raise ValueError("Sample larger than population or is negative")
364 result = [None] * k
365 setsize = 21 # size of a small set minus size of an empty list
ValueError: Sample larger than population or is negative
Is this related to my ad-hoc solution? Will using the code snippet you provided solve this error downstream?
Appreciate the help! Sid
Hi @sid5427
This is a known "bug" that is caused by the fact that you have an annotation (GEX_celltype
) with less than 5 cells.
However the fact that you're at this step means that SCENIC+ has indeed worked successfully. You can skip this optional step for now by setting calculate_TF_eGRN_correlation
to False. I will fix this bug a soon as I have some time.
Best,
Seppe
Hi,
you can use this https://github.com/aertslab/scenicplus/commit/6b4bdad3a7761904168702ba9b8c0c395b3afa45 function instead. It does not require generating pseudobulks beforehand.
Best,
Seppe
Same problem. I don't have menr.pkl and DEM_*_topics.pkl after running run_pycistarget. I have only CTX files. What could be the problem @SeppeDeWinter ?
@RosaDeSa
Did you have any error messages after running run_pycistarget
?
If not, you can try running using a single core, this might reveal some error message that was not passed properly.
Best,
Seppe
Thanks @SeppeDeWinter using a single core, it worked!
You did not see any error messages using a single core?
Best,
Seppe
Oddly, it worked without errors and gave me in output of all the files using a single core. Best, Rosa
Hi @sid5427
From the error I suspect that
region_sets['DARs']
might be empty or contain empty entries. Could you show the output ofregion_sets['DARs']
to confirm this?On your question wether it is possible to run SCENIC+ with a couple of the partial result. This is possible, you can generate the menr dictionary like this (in your case):
import dill CTX_topics_otsu_All = dill.load(open('results/motifs/CTX_topics_otsu_All.pkl', 'rb')) DEM_topics_otsu_All = dill.load(open('results/motifs/DEM_topics_otsu_All.pkl', 'rb')) CTX_topics_top_3_All = dill.load(open('results/motifs/CTX_topics_top_3_All.pkl', 'rb')) DEM_topics_top_3_All = dill.load(open('results/motifs/DEM_topics_top_3_All.pkl', 'rb')) menr['CTX_topics_otsu_All'] = CTX_topics_otsu_All menr['DEM_topics_otsu_All'] = DEM_topics_otsu_All menr['CTX_topics_top_3_All'] = CTX_topics_top_3_All menr['DEM_topics_top_3_All'] = DEM_topics_top_3_All
Best,
Seppe
Similar problem, my markers_dict is empty, which may be the cause of the death of core while running run_pycistarget. And it did not create CTX_topics_otsu_All.pkl as well as other pkl files. Instead, I only have CTX_topics_otsu_All files, should I combine all the html files and turn into a pkl and then run the above code?
Hi Seppe and other devs - Happy new year!
I am unfortunately facing another issue while running scenic with our human 10x multiome data.
When I run pycistarget it's throwing an error - "ValueError: A gene signature must have at least one gene."
Python version - 3.8.13 Scenic version - (not sure .. Updated to latest version on december 30th - version returns - AttributeError: module 'scenicplus' has no attribute 'version' <- might want to check this as well.)
This is how I am setting up pycistarget to run ...
output log -
I looked at other error reports- namely - https://github.com/aertslab/scenicplus/issues/60
and tried the same command with "save partial = TRUE" and "run without promoters - TRUE"
I get the same error - error log
Would it be possible to create the scenic object and run the scenic+ function with a couple of the partial result pickle files e.g. CTX_topics_otsu_All.pkl instead of menr.pkl?
Thanks! Sid.