aertslab / scenicplus

SCENIC+ is a python package to build gene regulatory networks (GRNs) using combined or separate single-cell gene expression (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) data.
Other
182 stars 29 forks source link

No cistromes found #30

Closed saketkc closed 2 years ago

saketkc commented 2 years ago

Describe the bug

I am trying to run scenicplus on a Pig dataset. I have generated motif scores following the steps in create_cisTarget_databases . I had to make multiple changes in the codebase to add support for this custom workflow, but I am running into issues withmerge_cistromes` step.

run_pycistarget goes through without issues and creates cistromes:

run_pycistarget(
    region_sets=region_sets,
    species=species,
    save_path=os.path.join(out_dir, "motifs"),
    ctx_db_path=rankings_db,
    dem_db_path=scores_db,
    path_to_motif_annotations=motif_annotation,
    run_without_promoters=True,
    n_cpu=4,
    _temp_dir=os.path.join(tmp_dir, "ray_spill"),
    annotation_version="2022",
)

image

The DEM hits make sense: image

However, at the merge_cistromes(scplus_obj) step, I get an error No cistromes found! Make sure that the motif enrichment results look good!

To Reproduce

I am not sure how to provide a reproducible example, but if you could suggest a workflow for running Scenic+ on custom organisms, that would be wonderful! It is quite possible that I am missing something obvious (possibly because I had to change the code to handle a new motif database).

Error output

    84 
     85     if len(signatures_direct.keys()) == 0 and len(signatures_extend.keys()) == 0:
---> 86         raise ValueError("No cistromes found! Make sure that the motif enrichment results look good!")
     87 
     88     #merge regions by TF name

ValueError: No cistromes found! Make sure that the motif enrichment results look good!

Version (please complete the following information):

Additional context Add any other context about the problem here.

SeppeDeWinter commented 2 years ago

Hi @saketkc

Exciting to see people using it on pig data.

From your screenshot it looks like your motifs are not annotated to TFs. (i.e. _Directannot and _Orthologyannot are NaN). If this is the case for all motif hits found then this explains your issue. The function is trying to merge the motif hits by TF name and for this it needs these annotations.

Did you create a custom motif-to-TF annotation for pig? i.e. the following parameter path_to_motif_annotations

Best

Seppe

saketkc commented 2 years ago

Thanks @SeppeDeWinter! I did create a custom motif-to-TF annotation (gist here):

#motif_id       motif_name      motif_description       source_name     source_version  gene_name       motif_similarity_qvalue similar_motif_id        similar_motif_description       orthologous_identityorthologous_gene_name    orthologous_species     description
MA0004.1 Arnt   ARNT    ARNT    JASPAR2022      2022    ARNT    0       None    None    0       None    None    gene is directly annotated

I also modified the associated function to read this file directly, keeping everything else the same:

def load_motif_annotations(specie: str,
                           version: str = 'v9',
                           fname: str = None,
                           column_names=('#motif_id', 'gene_name',
                                         'motif_similarity_qvalue', 'orthologous_identity', 'description'),
                           motif_similarity_fdr: float = 0.001,
                           orthologous_identity_threshold: float = 0.0):
    """
    Load motif annotations from a motif2TF snapshot.

    Parameters
    ---------
    specie:
        Specie to retrieve annotations for.
    version:
        Motif collection version.
    fname: 
        The snapshot taken from motif2TF.
    column_names: 
        The names of the columns in the snapshot to load.
    motif_similarity_fdr: 
        The maximum False Discovery Rate to find factor annotations for enriched motifs.
    orthologuous_identity_threshold: 
        The minimum orthologuous identity to find factor annotations for enriched motifs.

    Return
    ---------
        A dataframe with motif annotations for each motif
    """
    # Create a MultiIndex for the index combining unique gene name and motif ID. This should facilitate
    # later merging.
    fname = "/home/choudharys/github/JASPAR2022_CORE_vertebrates_non-redundant_pfms_motif_tfs.tbl"
    df = pd.read_csv(fname, sep='\t', usecols=column_names)
    df.rename(columns={'#motif_id':"MotifID",
                       'gene_name':"TF",
                       'motif_similarity_qvalue': "MotifSimilarityQvalue",
                       'orthologous_identity': "OrthologousIdentity",
                       'description': "Annotation" }, inplace=True)
    df = df[(df["MotifSimilarityQvalue"] <= motif_similarity_fdr) &
            (df["OrthologousIdentity"] >= orthologous_identity_threshold)]

    # Direct annotation
    df_direct_annot = df[df['Annotation'] == 'gene is directly annotated']
    df_direct_annot = df_direct_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
    df_direct_annot.index = df_direct_annot['MotifID']
    df_direct_annot = pd.DataFrame(df_direct_annot['TF'])
    df_direct_annot.columns = ['Direct_annot']
    # Indirect annotation - by motif similarity
    motif_similarity_annot = df[df['Annotation'].str.contains('similar') & ~df['Annotation'].str.contains('orthologous')]
    motif_similarity_annot = motif_similarity_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
    motif_similarity_annot.index =  motif_similarity_annot['MotifID']
    motif_similarity_annot = pd.DataFrame(motif_similarity_annot['TF'])
    motif_similarity_annot.columns = ['Motif_similarity_annot']
    # Indirect annotation - by orthology
    orthology_annot = df[~df['Annotation'].str.contains('similar') & df['Annotation'].str.contains('orthologous')]
    orthology_annot = orthology_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
    orthology_annot.index = orthology_annot['MotifID']
    orthology_annot = pd.DataFrame(orthology_annot['TF'])
    orthology_annot.columns = ['Orthology_annot']
    # Indirect annotation - by orthology
    motif_similarity_and_orthology_annot = df[df['Annotation'].str.contains('similar') & df['Annotation'].str.contains('orthologous')]
    motif_similarity_and_orthology_annot = motif_similarity_and_orthology_annot.groupby(['MotifID'])['TF'].apply(lambda x: ', '.join(list(set(x)))).reset_index()
    motif_similarity_and_orthology_annot.index = motif_similarity_and_orthology_annot['MotifID']
    motif_similarity_and_orthology_annot = pd.DataFrame(motif_similarity_and_orthology_annot['TF'])
    motif_similarity_and_orthology_annot.columns = ['Motif_similarity_and_Orthology_annot']
# Combine
    df = pd.concat([df_direct_annot, motif_similarity_annot, orthology_annot, motif_similarity_and_orthology_annot], axis=1, sort=False)
    return df

I do get back a dataframe with the motif ids annotated:
Screenshot from 2022-09-05 06-59-38

But it seems they are not getting assigned an annotation in the final output. Do you happen to know what am I missing?

cbravo93 commented 2 years ago

Hi @saketkc !

MotifID in your dataframe is not correct, it should be MA0002.2 instead of MA0002.2 Runx1 to agree with your motif names in the DEM output. Can you try to fix that?

Cheers!

Carmen

saketkc commented 2 years ago

Ahh, that was it. Thanks a lot both!

mairamirza commented 9 months ago

Hi Scenic+ Team,

I am also getting the same error as described in the above thread. I am following the exact same tutorial as described here and using this :https://scenicplus.readthedocs.io/en/latest/pbmc_multiome_tutorial.html#inferring-enhancer-driven-Gene-Regulatory-Networks-(eGRNs)-using-SCENIC+ and i am using this motif to TF file : motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl available at https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl

My run_pycistarget goes through without any error with a cistrome forming statement in logs

from scenicplus.wrappers.run_pycistarget import run_pycistarget
run_pycistarget(
    region_sets = region_sets,
    species = 'homo_sapiens',
    save_path = os.path.join(work_dir, 'motifs'),
    ctx_db_path = rankings_db,
    dem_db_path = scores_db,
    path_to_motif_annotations = motif_annotation,
    run_without_promoters = True,
    n_cpu = 1,
    _temp_dir = os.path.join("/data/manke/processing/mirza/tmp/", 'ray_spill'),
    annotation_version = 'v10nr_clust',
    )

when I run menr['DEM_topics_top_3_No_promoters'].DEM_results('Topic8')

I get the following :

Screenshot 2024-01-18 at 4 50 40 PM

now when I run this running this:

from scenicplus.wrappers.run_scenicplus import run_scenicplus
try:
    run_scenicplus(
        scplus_obj = scplus_obj,
        variable = ['GEX_celltype'],
        species = 'hsapiens',
        assembly = 'hg38',
        tf_file = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial/data/utoronto_human_tfs_v_1.01.txt',
        save_path = os.path.join(work_dir, 'scenicplus'),
        biomart_host = biomart_host,
        upstream = [1000, 150000],
        downstream = [1000, 150000],
        calculate_TF_eGRN_correlation = True,
        calculate_DEGs_DARs = True,
        export_to_loom_file = True,
        export_to_UCSC_file = True,
        path_bedToBigBed = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial',
        n_cpu = 16,
        _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
except Exception as e:
    #in case of failure, still save the object
    dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
    raise(e)

I get this error:

> 2024-01-18 10:27:31,296 SCENIC+_wrapper INFO     /data/manke/processing/mirza/test_scenic/pbmc_tutorial/scenicplus folder already exists.
> 2024-01-18 10:27:31,299 SCENIC+_wrapper INFO     Merging cistromes
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> Cell In[30], line 23
>      20 except Exception as e:
>      21     #in case of failure, still save the object
>      22     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
> ---> 23     raise(e)
> 
> Cell In[30], line 3
>       1 from scenicplus.wrappers.run_scenicplus import run_scenicplus
>       2 try:
> ----> 3     run_scenicplus(
>       4         scplus_obj = scplus_obj,
>       5         variable = ['GEX_celltype'],
>       6         species = 'hsapiens',
>       7         assembly = 'hg38',
>       8         tf_file = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial/data/utoronto_human_tfs_v_1.01.txt',
>       9         save_path = os.path.join(work_dir, 'scenicplus'),
>      10         biomart_host = biomart_host,
>      11         upstream = [1000, 150000],
>      12         downstream = [1000, 150000],
>      13         calculate_TF_eGRN_correlation = True,
>      14         calculate_DEGs_DARs = True,
>      15         export_to_loom_file = True,
>      16         export_to_UCSC_file = True,
>      17         path_bedToBigBed = '/data/manke/processing/mirza/test_scenic/pbmc_tutorial',
>      18         n_cpu = 16,
>      19         _temp_dir = os.path.join(tmp_dir, 'ray_spill'))
>      20 except Exception as e:
>      21     #in case of failure, still save the object
>      22     dill.dump(scplus_obj, open(os.path.join(work_dir, 'scenicplus/scplus_obj.pkl'), 'wb'), protocol=-1)
> 
> File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/scenicplus/wrappers/run_scenicplus.py:129, in run_scenicplus(scplus_obj, variable, species, assembly, tf_file, save_path, biomart_host, upstream, downstream, region_ranking, gene_ranking, simplified_eGRN, calculate_TF_eGRN_correlation, calculate_DEGs_DARs, export_to_loom_file, export_to_UCSC_file, tree_structure, path_bedToBigBed, n_cpu, _temp_dir, save_partial, **kwargs)
>     127 if 'Cistromes' not in scplus_obj.uns.keys():
>     128     log.info('Merging cistromes')
> --> 129     merge_cistromes(scplus_obj)
>     132 if 'search_space' not in scplus_obj.uns.keys():
>     133     log.info('Getting search space')
> 
> File ~/localenv/mirza/anaconda/envs/scenicplus/lib/python3.8/site-packages/scenicplus/cistromes.py:90, in merge_cistromes(scplus_obj, cistromes_key, subset)
>      87 signatures_extend = {k:v for k,v in signatures_extend.items() if v}
>      89 if len(signatures_direct.keys()) == 0 and len(signatures_extend.keys()) == 0:
> ---> 90     raise ValueError("No cistromes found! Make sure that the motif enrichment results look good!")
>      92 #merge regions by TF name
>      93 if len(signatures_direct.keys()) > 0:
> 
> ValueError: No cistromes found! Make sure that the motif enrichment results look good!
> 

I would be grateful if somebody can help me solve this issue.

Thanks a bunch!

SeppeDeWinter commented 9 months ago

Hi @mairamirza

Thanks for using SCENIC+.

Can you check wether your motifs are annotated to TFs (in the html files)?

In the screenshot you are showing I don't see any annotation (there should be TF names listed), but this might be because the screenshot is cutoff.

All the best,

Seppe

mairamirza commented 9 months ago

Hi @SeppeDeWinter , Thanks for the reply. I have these following html files in my motif directory.

|-- CTX_DARs_All
|   |-- B_cells_1.html
|   |-- B_cells_2.html
|   |-- CD14+_Monocytes.html
|   |-- CD4_T_cells.html
|   |-- CD8_T_cells.html
|   |-- Dendritic_cells.html
|   |-- FCGR3A+_Monocytes.html
|   `-- NK_cells.html
|-- CTX_DARs_No_promoters
|   |-- B_cells_1.html
|   |-- B_cells_2.html
|   |-- CD14+_Monocytes.html
|   |-- CD4_T_cells.html
|   |-- CD8_T_cells.html
|   |-- Dendritic_cells.html
|   |-- FCGR3A+_Monocytes.html
|   `-- NK_cells.html
|-- CTX_topics_otsu_All
|   |-- Topic10.html
|   |-- Topic11.html
|   |-- Topic12.html
|   |-- Topic13.html
|   |-- Topic14.html
|   |-- Topic15.html
|   |-- Topic16.html
|   |-- Topic1.html
|   |-- Topic2.html
|   |-- Topic3.html
|   |-- Topic4.html
|   |-- Topic5.html
|   |-- Topic6.html
|   |-- Topic7.html
|   |-- Topic8.html
|   `-- Topic9.html
|-- CTX_topics_otsu_No_promoters
|   |-- Topic10.html
|   |-- Topic11.html
|   |-- Topic12.html
|   |-- Topic13.html
|   |-- Topic14.html
|   |-- Topic15.html
|   |-- Topic16.html
|   |-- Topic1.html
|   |-- Topic2.html
|   |-- Topic3.html
|   |-- Topic4.html
|   |-- Topic5.html
|   |-- Topic6.html
|   |-- Topic7.html
|   |-- Topic8.html
|   `-- Topic9.html
|-- CTX_topics_top_3_All
|   |-- Topic10.html
|   |-- Topic11.html
|   |-- Topic12.html
|   |-- Topic13.html
|   |-- Topic14.html
|   |-- Topic15.html
|   |-- Topic16.html
|   |-- Topic1.html
|   |-- Topic2.html
|   |-- Topic3.html
|   |-- Topic4.html
|   |-- Topic5.html
|   |-- Topic6.html
|   |-- Topic7.html
|   |-- Topic8.html
|   `-- Topic9.html
|-- CTX_topics_top_3_No_promoters
|   |-- Topic10.html
|   |-- Topic11.html
|   |-- Topic12.html
|   |-- Topic13.html
|   |-- Topic14.html
|   |-- Topic15.html
|   |-- Topic16.html
|   |-- Topic1.html
|   |-- Topic2.html
|   |-- Topic3.html
|   |-- Topic4.html
|   |-- Topic5.html
|   |-- Topic6.html
|   |-- Topic7.html
|   |-- Topic8.html
|   `-- Topic9.html
|-- DEM_DARs_All
|   |-- B_cells_1.html
|   |-- B_cells_2.html
|   |-- CD14+_Monocytes.html
|   |-- CD4_T_cells.html
|   |-- CD8_T_cells.html
|   |-- Dendritic_cells.html
|   |-- FCGR3A+_Monocytes.html
|   `-- NK_cells.html
|-- DEM_DARs_No_promoters
|   |-- B_cells_1.html
|   |-- B_cells_2.html
|   |-- CD14+_Monocytes.html
|   |-- CD4_T_cells.html
|   |-- CD8_T_cells.html
|   |-- Dendritic_cells.html
|   |-- FCGR3A+_Monocytes.html
|   `-- NK_cells.html
|-- DEM_topics_otsu_All
|   |-- Topic10.html
|   |-- Topic11.html
|   |-- Topic12.html
|   |-- Topic13.html
|   |-- Topic14.html
|   |-- Topic15.html
|   |-- Topic16.html
|   |-- Topic1.html
|   |-- Topic2.html
|   |-- Topic3.html
|   |-- Topic4.html
|   |-- Topic5.html
|   |-- Topic6.html
|   |-- Topic7.html
|   |-- Topic8.html
|   `-- Topic9.html
|-- DEM_topics_otsu_No_promoters
|   |-- Topic10.html
|   |-- Topic11.html
|   |-- Topic12.html
|   |-- Topic13.html
|   |-- Topic14.html
|   |-- Topic15.html
|   |-- Topic16.html
|   |-- Topic1.html
|   |-- Topic2.html
|   |-- Topic3.html
|   |-- Topic4.html
|   |-- Topic5.html
|   |-- Topic6.html
|   |-- Topic7.html
|   |-- Topic8.html
|   `-- Topic9.html
|-- DEM_topics_top_3_All
|   |-- Topic10.html
|   |-- Topic11.html
|   |-- Topic12.html
|   |-- Topic13.html
|   |-- Topic14.html
|   |-- Topic15.html
|   |-- Topic16.html
|   |-- Topic1.html
|   |-- Topic2.html
|   |-- Topic3.html
|   |-- Topic4.html
|   |-- Topic5.html
|   |-- Topic6.html
|   |-- Topic7.html
|   |-- Topic8.html
|   `-- Topic9.html
|-- DEM_topics_top_3_No_promoters
|   |-- Topic10.html
|   |-- Topic11.html
|   |-- Topic12.html
|   |-- Topic13.html
|   |-- Topic14.html
|   |-- Topic15.html
|   |-- Topic16.html
|   |-- Topic1.html
|   |-- Topic2.html
|   |-- Topic3.html
|   |-- Topic4.html
|   |-- Topic5.html
|   |-- Topic6.html
|   |-- Topic7.html
|   |-- Topic8.html
|   `-- Topic9.html
`-- menr.pkl

My B_cells1.html and Topic6.html files looks like these respectively

Screenshot 2024-01-20 at 2 06 21 PM Screenshot 2024-01-20 at 2 06 34 PM

e6fe">

I am not sure if you are referring to these files. I am new to this, could you please guide me further.

Thanks.

SeppeDeWinter commented 9 months ago

Hi @mairamirza

Indeed, these are the correct files. However the motif-to-TF annotation is missing (at least one of the following columns should be present: Direct_annot, Orthology_annot).

Can you check wether you are using the correct motif-to-TF annotation file?

All the best,

Seppe