aertslab / pycistarget

pycistarget is a python module to perform motif enrichment analysis in sets of regions with different tools and identify high confidence TF cistromes.
Other
13 stars 4 forks source link

Couldn't generate menr.pkl file for SCENIC+ #27

Open Jay2942023 opened 10 months ago

Jay2942023 commented 10 months ago

Hello,

I am trying to get familiar with SCENIC+ workflow using 10x multiome pbmc tutorial. However, pycistarget couldn't generate menr.pkl file for SCENIC+ input. All I get are a few lists of html file for different topics. Here I attached the code and error message.

from scenicplus.wrappers.run_pycistarget import run_pycistarget run_pycistarget( region_sets = region_sets, species = 'homo_sapiens', save_path = os.path.join(work_dir, 'motifs'), ctx_db_path = rankings_db, dem_db_path = scores_db, path_to_motif_annotations = motif_annotation, run_without_promoters = True, n_cpu = 8, _temp_dir = os.path.join(tmp_dir, 'ray_spill'), annotation_version = 'v10nr_clust', )

Jay2942023 commented 10 months ago

Here is the error: File "/usr/local/lib/python3.8/site-packages/pycistarget/motif_enrichment_dem.py", line 319, in init self.run(dem_db.db_scores, **kwargs) File "/usr/local/lib/python3.8/site-packages/pycistarget/motif_enrichment_dem.py", line 365, in run region_groups = [create_groups(contrast = contrasts[x], File "/usr/local/lib/python3.8/site-packages/pycistarget/motif_enrichment_dem.py", line 365, in region_groups = [create_groups(contrast = contrasts[x], File "/usr/local/lib/python3.8/site-packages/pycistarget/motif_enrichment_dem.py", line 534, in create_groups bg_pr_overlap = pr.PyRanges(region_names_to_coordinates(background)).count_overlaps(annotation) File "/usr/local/lib/python3.8/site-packages/pycistarget/utils.py", line 33, in region_names_to_coordinates regiondf.columns=['Chromosome', 'Start', 'End'] File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 5920, in setattr return object.setattr(self, name, value) File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.set File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 822, in _set_axis self._mgr.set_axis(axis, labels) File "/usr/local/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 228, in set_axis self._validate_set_axis(axis, new_labels) File "/usr/local/lib/python3.8/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis raise ValueError( ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements

SeppeDeWinter commented 9 months ago

Hi @Jay2942023

Can you check wether the chromosome names in your sets of regions map with the chromosome names in the genome annotation (pr_annot).

All the best,

Seppe

mairamirza commented 5 months ago

Hi @SeppeDeWinter

I am encountering the same 'ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements' on this code chunk

for DAR in markers_dict.keys(): regions = markers_dict[DAR].index[markers_dict[DAR].index.str.startswith('chr')] region_sets['DARs'][DAR] = pr.PyRanges(region_names_to_coordinates(regions))

my annot has following chromosome names: ` Chromosome Start Strand Gene Transcript_type
chr2 154551776 1 Actl10 protein_coding
chrX 31117674 1 Btbd35f29 protein_coding
chr7 84915781 1 Olfr290 protein_coding
chrY 90839177 -1 Gm21748 protein_coding
chr4 108719649 1 3110021N24Rik protein_coding
... ... ... ... ...
chr6 134791334 -1 Dusp16 protein_coding
chr14 51203391 1 Ear14 protein_coding
chr14 51203689 1 Ear14 protein_coding
chr12 85274286 1 Zc2hc1c protein_coding
chr12 85288591 1 Zc2hc1c protein_coding

`

and my markers_dict has following chromosome details: `{'AP, cycling': Log2FC Adjusted_pval Contrast chr1:146160518-146161018 2.952461 6.024392e-25 AP, cycling chr18:33138394-33138894 2.950420 5.393040e-25 AP, cycling chr2:120537711-120538211 2.950420 5.393040e-25 AP, cycling chr1:64330705-64331205 2.906402 5.527621e-25 AP, cycling chr7:125731094-125731594 2.904940 6.708995e-25 AP, cycling ... ... ... ... chr15:10746557-10747057 0.586071 2.838126e-08 AP, cycling chr7:89400669-89401169 0.586049 8.302689e-14 AP, cycling chr17:23726623-23727123 0.585742 3.541455e-20 AP, cycling chr8:54724231-54724731 0.585447 2.128771e-21 AP, cycling chr6:120471574-120472074 0.585394 3.926210e-09 AP, cycling

[11247 rows x 3 columns], 'BP, neurogenic': Log2FC Adjusted_pval Contrast chr12:52020608-52021108 4.009526 1.576404e-79 BP, neurogenic chr2:96499436-96499936 3.983610 1.576404e-79 BP, neurogenic`

Could you please help me fix this error? Thanks