Closed saketkc closed 2 years ago
Hi @saketkc
Exciting to see people using it on pig data.
From your screenshot it looks like your motifs are not annotated to TFs. (i.e. _Directannot and _Orthologyannot are NaN). If this is the case for all motif hits found then this explains your issue. The function is trying to merge the motif hits by TF name and for this it needs these annotations.
Did you create a custom motif-to-TF annotation for pig? i.e. the following parameter path_to_motif_annotations
Best
Seppe
Thanks @SeppeDeWinter! I did create a custom motif-to-TF annotation (gist here):
#motif_id motif_name motif_description source_name source_version gene_name motif_similarity_qvalue similar_motif_id similar_motif_description orthologous_identityorthologous_gene_name orthologous_species description
MA0004.1 Arnt ARNT ARNT JASPAR2022 2022 ARNT 0 None None 0 None None gene is directly annotated
I also modified the associated function to read this file directly, keeping everything else the same:
def load_motif_annotations(specie: str,
version: str = 'v9',
fname: str = None,
column_names=('#motif_id', 'gene_name',
'motif_similarity_qvalue', 'orthologous_identity', 'description'),
motif_similarity_fdr: float = 0.001,
orthologous_identity_threshold: float = 0.0):
"""
Load motif annotations from a motif2TF snapshot.
Parameters
---------
specie:
Specie to retrieve annotations for.
version:
Motif collection version.
fname:
The snapshot taken from motif2TF.
column_names:
The names of the columns in the snapshot to load.
motif_similarity_fdr:
The maximum False Discovery Rate to find factor annotations for enriched motifs.
orthologuous_identity_threshold:
The minimum orthologuous identity to find factor annotations for enriched motifs.
Return
---------
A dataframe with motif annotations for each motif
"""
# Create a MultiIndex for the index combining unique gene name and motif ID. This should facilitate
# later merging.
fname = "/home/choudharys/github/JASPAR2022_CORE_vertebrates_non-redundant_pfms_motif_tfs.tbl"
df = pd.read_csv(fname, sep='\t', usecols=column_names)
df.rename(columns={'#motif_id':"MotifID",
'gene_name':"TF",
'motif_similarity_qvalue': "MotifSimilarityQvalue",
'orthologous_identity': "OrthologousIdentity",
'description': "Annotation" }, inplace=True)
df = df[(df["MotifSimilarityQvalue"] <= motif_similarity_fdr) &
(df["OrthologousIdentity"] >= orthologous_identity_threshold)]
# Direct annotation
df_direct_annot = df[df['Annotation'] == 'gene is directly annotated']
df_direct_annot = df_direct_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
df_direct_annot.index = df_direct_annot['MotifID']
df_direct_annot = pd.DataFrame(df_direct_annot['TF'])
df_direct_annot.columns = ['Direct_annot']
# Indirect annotation - by motif similarity
motif_similarity_annot = df[df['Annotation'].str.contains('similar') & ~df['Annotation'].str.contains('orthologous')]
motif_similarity_annot = motif_similarity_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
motif_similarity_annot.index = motif_similarity_annot['MotifID']
motif_similarity_annot = pd.DataFrame(motif_similarity_annot['TF'])
motif_similarity_annot.columns = ['Motif_similarity_annot']
# Indirect annotation - by orthology
orthology_annot = df[~df['Annotation'].str.contains('similar') & df['Annotation'].str.contains('orthologous')]
orthology_annot = orthology_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
orthology_annot.index = orthology_annot['MotifID']
orthology_annot = pd.DataFrame(orthology_annot['TF'])
orthology_annot.columns = ['Orthology_annot']
# Indirect annotation - by orthology
motif_similarity_and_orthology_annot = df[df['Annotation'].str.contains('similar') & df['Annotation'].str.contains('orthologous')]
motif_similarity_and_orthology_annot = motif_similarity_and_orthology_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
motif_similarity_and_orthology_annot.index = motif_similarity_and_orthology_annot['MotifID']
motif_similarity_and_orthology_annot = pd.DataFrame(motif_similarity_and_orthology_annot['TF'])
motif_similarity_and_orthology_annot.columns = ['Motif_similarity_and_Orthology_annot']
# Combine
df = pd.concat([df_direct_annot, motif_similarity_annot, orthology_annot, motif_similarity_and_orthology_annot], axis=1, sort=False)
return df
I do get back a dataframe with the motif ids annotated:
But it seems they are not getting assigned an annotation in the final output. Do you happen to know what am I missing?
Hi @saketkc !
MotifID in your dataframe is not correct, it should be MA0002.2
instead of MA0002.2 Runx1
to agree with your motif names in the DEM output. Can you try to fix that?
Cheers!
Carmen
Ahh, that was it. Thanks a lot both!
Hi Scenic+ Team,
I am also getting the same error as described in the above thread. I am following the exact same tutorial as described here and using this :https://scenicplus.readthedocs.io/en/latest/pbmc_multiome_tutorial.html#inferring-enhancer-driven-Gene-Regulatory-Networks-(eGRNs)-using-SCENIC+ and i am using this motif to TF file : motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl available at https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
My run_pycistarget
goes through without any error with a cistrome forming statement in logs
from scenicplus.wrappers.run_pycistarget import run_pycistarget
run_pycistarget(
region_sets = region_sets,
species = 'homo_sapiens',
save_path = os.path.join(work_dir, 'motifs'),
ctx_db_path = rankings_db,
dem_db_path = scores_db,
path_to_motif_annotations = motif_annotation,
run_without_promoters = True,
n_cpu = 1,
_temp_dir = os.path.join("/data/manke/processing/mirza/tmp/", 'ray_spill'),
annotation_version = 'v10nr_clust',
)
when I run menr['DEM_topics_top_3_No_promoters'].DEM_results('Topic8')
I get the following :
now when I run this running this:
from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
run_scenicplus(
scplus_obj = scplus_obj,
variable = ['GEX_celltype'],
species = 'hsapiens',
assembly = 'hg38',
tf_file = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial/data/utoronto_human_tfs_v_1.01.txt',
save_path = os.path.join(work_dir, 'scenicplus'),
biomart_host = biomart_host,
upstream = [1000, 150000],
downstream = [1000, 150000],
calculate_TF_eGRN_correlation = True,
calculate_DEGs_DARs = True,
export_to_loom_file = True,
export_to_UCSC_file = True,
path_bedToBigBed = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial',
n_cpu = 16,
_temp_dir = os.path.join(tmp_dir, 'ray_spill'))
except Exception as e:
#in case of failure, still save the object
dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
raise(e)
I get this error:
> 2024-01-18 10:27:31,296 SCENIC+_wrapper INFO /data/manke/processing/mirza/test_scenic/pbmc_tutorial/scenicplus folder already exists.
> 2024-01-18 10:27:31,299 SCENIC+_wrapper INFO Merging cistromes
> ---------------------------------------------------------------------------
> ValueError Traceback (most recent call last)
> Cell In[30], line 23
> 20 except Exception as e:
> 21 #in case of failure, still save the object
> 22 dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
> ---> 23 raise(e)
>
> Cell In[30], line 3
> 1 from scenicplus.wrappers.run_scenicplus import run_scenicplus
> 2 try:
> ----> 3 run_scenicplus(
> 4 scplus_obj = scplus_obj,
> 5 variable = ['GEX_celltype'],
> 6 species = 'hsapiens',
> 7 assembly = 'hg38',
> 8 tf_file = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial/data/utoronto_human_tfs_v_1.01.txt',
> 9 save_path = os.path.join(work_dir, 'scenicplus'),
> 10 biomart_host = biomart_host,
> 11 upstream = [1000, 150000],
> 12 downstream = [1000, 150000],
> 13 calculate_TF_eGRN_correlation = True,
> 14 calculate_DEGs_DARs = True,
> 15 export_to_loom_file = True,
> 16 export_to_UCSC_file = True,
> 17 path_bedToBigBed = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial',
> 18 n_cpu = 16,
> 19 _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
> 20 except Exception as e:
> 21 #in case of failure, still save the object
> 22 dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
>
> File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/scenicplus/wrappers/run_scenicplus.py:129, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
> 127 if 'Cistromes' not in scplus_obj.uns.keys():
> 128 log.info('Merging cistromes')
> --> 129 merge_cistromes(scplus_obj)
> 132 if 'search_space' not in scplus_obj.uns.keys():
> 133 log.info('Getting search space')
>
> File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/scenicplus/cistromes.py:90, in merge_cistromes(scplus_obj, cistromes_key, subset)
> 87 signatures_extend = {k:v for k,v in signatures_extend.items() if v}
> 89 if len(signatures_direct.keys()) == 0 and len(signatures_extend.keys()) == 0:
> ---> 90 raise ValueError("No cistromes found! Make sure that the motif enrichment results look good!")
> 92 #merge regions by TF name
> 93 if len(signatures_direct.keys()) > 0:
>
> ValueError: No cistromes found! Make sure that the motif enrichment results look good!
>
I would be grateful if somebody can help me solve this issue.
Thanks a bunch!
Hi @mairamirza
Thanks for using SCENIC+.
Can you check wether your motifs are annotated to TFs (in the html files)?
In the screenshot you are showing I don't see any annotation (there should be TF names listed), but this might be because the screenshot is cutoff.
All the best,
Seppe
Hi @SeppeDeWinter , Thanks for the reply. I have these following html files in my motif directory.
|-- CTX_DARs_All
| |-- B_cells_1.html
| |-- B_cells_2.html
| |-- CD14+_Monocytes.html
| |-- CD4_T_cells.html
| |-- CD8_T_cells.html
| |-- Dendritic_cells.html
| |-- FCGR3A+_Monocytes.html
| `-- NK_cells.html
|-- CTX_DARs_No_promoters
| |-- B_cells_1.html
| |-- B_cells_2.html
| |-- CD14+_Monocytes.html
| |-- CD4_T_cells.html
| |-- CD8_T_cells.html
| |-- Dendritic_cells.html
| |-- FCGR3A+_Monocytes.html
| `-- NK_cells.html
|-- CTX_topics_otsu_All
| |-- Topic10.html
| |-- Topic11.html
| |-- Topic12.html
| |-- Topic13.html
| |-- Topic14.html
| |-- Topic15.html
| |-- Topic16.html
| |-- Topic1.html
| |-- Topic2.html
| |-- Topic3.html
| |-- Topic4.html
| |-- Topic5.html
| |-- Topic6.html
| |-- Topic7.html
| |-- Topic8.html
| `-- Topic9.html
|-- CTX_topics_otsu_No_promoters
| |-- Topic10.html
| |-- Topic11.html
| |-- Topic12.html
| |-- Topic13.html
| |-- Topic14.html
| |-- Topic15.html
| |-- Topic16.html
| |-- Topic1.html
| |-- Topic2.html
| |-- Topic3.html
| |-- Topic4.html
| |-- Topic5.html
| |-- Topic6.html
| |-- Topic7.html
| |-- Topic8.html
| `-- Topic9.html
|-- CTX_topics_top_3_All
| |-- Topic10.html
| |-- Topic11.html
| |-- Topic12.html
| |-- Topic13.html
| |-- Topic14.html
| |-- Topic15.html
| |-- Topic16.html
| |-- Topic1.html
| |-- Topic2.html
| |-- Topic3.html
| |-- Topic4.html
| |-- Topic5.html
| |-- Topic6.html
| |-- Topic7.html
| |-- Topic8.html
| `-- Topic9.html
|-- CTX_topics_top_3_No_promoters
| |-- Topic10.html
| |-- Topic11.html
| |-- Topic12.html
| |-- Topic13.html
| |-- Topic14.html
| |-- Topic15.html
| |-- Topic16.html
| |-- Topic1.html
| |-- Topic2.html
| |-- Topic3.html
| |-- Topic4.html
| |-- Topic5.html
| |-- Topic6.html
| |-- Topic7.html
| |-- Topic8.html
| `-- Topic9.html
|-- DEM_DARs_All
| |-- B_cells_1.html
| |-- B_cells_2.html
| |-- CD14+_Monocytes.html
| |-- CD4_T_cells.html
| |-- CD8_T_cells.html
| |-- Dendritic_cells.html
| |-- FCGR3A+_Monocytes.html
| `-- NK_cells.html
|-- DEM_DARs_No_promoters
| |-- B_cells_1.html
| |-- B_cells_2.html
| |-- CD14+_Monocytes.html
| |-- CD4_T_cells.html
| |-- CD8_T_cells.html
| |-- Dendritic_cells.html
| |-- FCGR3A+_Monocytes.html
| `-- NK_cells.html
|-- DEM_topics_otsu_All
| |-- Topic10.html
| |-- Topic11.html
| |-- Topic12.html
| |-- Topic13.html
| |-- Topic14.html
| |-- Topic15.html
| |-- Topic16.html
| |-- Topic1.html
| |-- Topic2.html
| |-- Topic3.html
| |-- Topic4.html
| |-- Topic5.html
| |-- Topic6.html
| |-- Topic7.html
| |-- Topic8.html
| `-- Topic9.html
|-- DEM_topics_otsu_No_promoters
| |-- Topic10.html
| |-- Topic11.html
| |-- Topic12.html
| |-- Topic13.html
| |-- Topic14.html
| |-- Topic15.html
| |-- Topic16.html
| |-- Topic1.html
| |-- Topic2.html
| |-- Topic3.html
| |-- Topic4.html
| |-- Topic5.html
| |-- Topic6.html
| |-- Topic7.html
| |-- Topic8.html
| `-- Topic9.html
|-- DEM_topics_top_3_All
| |-- Topic10.html
| |-- Topic11.html
| |-- Topic12.html
| |-- Topic13.html
| |-- Topic14.html
| |-- Topic15.html
| |-- Topic16.html
| |-- Topic1.html
| |-- Topic2.html
| |-- Topic3.html
| |-- Topic4.html
| |-- Topic5.html
| |-- Topic6.html
| |-- Topic7.html
| |-- Topic8.html
| `-- Topic9.html
|-- DEM_topics_top_3_No_promoters
| |-- Topic10.html
| |-- Topic11.html
| |-- Topic12.html
| |-- Topic13.html
| |-- Topic14.html
| |-- Topic15.html
| |-- Topic16.html
| |-- Topic1.html
| |-- Topic2.html
| |-- Topic3.html
| |-- Topic4.html
| |-- Topic5.html
| |-- Topic6.html
| |-- Topic7.html
| |-- Topic8.html
| `-- Topic9.html
`-- menr.pkl
My B_cells1.html and Topic6.html files looks like these respectively
e6fe">
I am not sure if you are referring to these files. I am new to this, could you please guide me further.
Thanks.
Hi @mairamirza
Indeed, these are the correct files. However the motif-to-TF annotation is missing (at least one of the following columns should be present: Direct_annot
, Orthology_annot
).
Can you check wether you are using the correct motif-to-TF annotation file?
All the best,
Seppe
Describe the bug
I am trying to run scenicplus on a Pig dataset. I have generated motif scores following the steps in
create_cisTarget_databases
. I had to make multiple changes in the codebase to add support for this custom workflow, but I am running into issues with
merge_cistromes` step.run_pycistarget
goes through without issues and creates cistromes:The DEM hits make sense:
However, at the
merge_cistromes(scplus_obj)
step, I get an errorNo cistromes found! Make sure that the motif enrichment results look good!
To Reproduce
I am not sure how to provide a reproducible example, but if you could suggest a workflow for running Scenic+ on custom organisms, that would be wonderful! It is quite possible that I am missing something obvious (possibly because I had to change the code to handle a new motif database).
Error output
Version (please complete the following information):
Python: 3.7.10
SCENIC+: '0.1.dev445+g615c88b.d20220902'
If a bug is related to another module [e.g. matplotlib 3.3.9]
Additional context Add any other context about the problem here.