A Snakemake workflow for performing genomic region set and gene set enrichment analyses using LOLA, GREAT, GSEApy, pycisTarget and RcisTarget.
The results are not generating

sandragold closed 1 year ago

sandragold commented 1 year ago


It was a great idea to create such tool! I have an issue and I don't know where might be a root cause that the results are not generating.

When running: $ snakemake -p --conda-frontend conda --configfile config/config.yaml -c1 I receive:

Config file config/config.yaml is extended by additional config specified via the command line.
Building DAG of [jobs...]
Nothing to be done (all requested files are present and up to date).
Complete log: .snakemake/log/2023-04-06T075124.125638.snakemake.log

In the complete log is the same information which is displayed on the screen above.

I attach the enrichment_analysis_annotation.csv file. Here is a config.yaml content:

# alwayse use absolute paths

##### RESOURCES #####
partition: 'tinyq'
mem: '32000'
threads: 1

##### GENERAL #####
annotation: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/enrichment_analysis_annotation.csv
result_path: /mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/results/
project_name: Sorted_cyto_from_10kb_bins

# genome
# human 'hg19' or 'hg38' 
# mouse 'mm9' or 'mm10'
genome: 'hg38'

##### TOOLS #####

### GSEApy - ORA Enrichr (Fisher/hypergeometric test) and preranked GSEA based analysis

# Databases downloaded from Enrichr (https://maayanlab.cloud/Enrichr/#libraries)
# example: enrichr_dbs: ["KEGG_2021_Mouse", "GO_Biological_Process_2021", "WikiPathways_2019_Mouse"]
enrichr_dbs: ["KEGG_2021_Human", "GO_Biological_Process_2021", "WikiPathways_2019_Human"]

# Databases in GMT format containing Gene Symbols e.g, downloaded from MSigDB (http://www.gsea-msigdb.org/gsea/msigdb)
    MyMSigDB: "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/databases/msigdb.v2023.1.Hs.symbols.gmt"

# path to local databases as JSON files will be loaded as dictionaries
# example content: { "MyDB_Term1": ["geneA","geneB","geneC"],"MyDB_Term2": ["geneX","geneY","geneZ"]}
    MyDB: "/mnt/polkanowa2/Cytometh_Bartosz/enrichment_analysis/enrichment_analysis/databases/c2.cp.wikipathways.v2023.1.Hs.json"

### GREAT - region-gene association based analysis

# databases to be queried from GREAT (https://great-help.atlassian.net/wiki/spaces/GREAT/pages/655440/Ontologies)
# not all ontologies are available for all genomes and GREAT versions (here we use version 4)
great_dbs: ['GO Molecular Function','GO Biological Process','GO Cellular Component','Mouse Phenotype','Mouse Phenotype Single KO','Human Phenotype']

### LOLA - region overlap based analysis

# databases to be queried by LOLA (https://databio.org/regiondb)
# not all databases are available for all genomes (eg mm10 only supports LOLACore)
lola_dbs: ['LOLACore','jaspar_motifs','roadmap_epigenomics']

### Enrichment plot

# tool specific column names for aggregation, plotting & summaries
        top_n: 25
        p_value: 'P_value'
        adj_pvalue: 'Adjusted_P_value'
        effect_size: 'Odds_Ratio'
        overlap: 'Overlap'
        term: 'Term'
        top_n: 25
        p_value: 'NOM_p_val'
        adj_pvalue: 'FDR_q_val'
        effect_size: 'NES'
        overlap: 'Tag'
        term: 'Term'
        top_n: 25
        p_value: "HyperP"
        adj_pvalue: "HyperFdrQ"
        effect_size: "RegionFoldEnrich"
        overlap: "TermCov"
        term: "Desc"
        top_n: 25
        p_value: "pValue"
        adj_pvalue: "qValue"
        effect_size: "oddsRatio"
        overlap: "support"
        term: "description"

# GREAT before
#     GREAT:
#         top_n: 25
#         p_value: "Hyper_Raw_PValue"
#         adj_pvalue: "Hyper_Adjp_BH"
#         effect_size: "Hyper_Fold_Enrichment"
#         overlap: "Hyper_Region_Set_Coverage"
#         term: "name"


# adjusted p-value threshold per tool to denote statistical significance
    ORA_GSEApy: 0.05
    preranked_GSEApy: 0.05
    GREAT: 0.01
    LOLA: 0.01

# number of top terms per feature set within each group for all overview plots (adjusted p-value, effect-size and bubble-heatmap)
top_terms_n: 5

# cap for adjusted p-value plotting: -log10(adjusted p-value) > adjp_cap -> adjp_cap
adjp_cap: 4

# cap for odds ratio plotting: abs(log2(odds ratio)) > or_cap -> sign(log2(odds ratio)) * or_cap
or_cap: 5

# cap for  normalized enrichemnt scores (NES) abs(nes) > nes_cap -> sign(nes) * nes_cap
# applicable only to preranked_GSEApy
nes_cap: 5

If you need anything else please let me know! Thanks for your help!


sreichl commented 1 year ago

Hi Sandra, thanks for reaching out. Everything (config and annotation) seems to be in order.

The latest Snakemake version seems to be the problem. Please try Snakemake version 7.15.2. The module has been tested with this version. So I hope it should work then.

Cheers, Stephan

PS: If the issue is resolved, please close it.

sandragold commented 1 year ago


Thanks for your reply. My Snakemake version is 7.25.0. If you haven't tested it with that version I am going to do a downgrade to 7.15.2 and try again :)

Cheers, Sandra

sandragold commented 1 year ago

It worked, thanks!