egeulgen / pathfindR

pathfindR: Enrichment Analysis Utilizing Active Subnetworks
https://egeulgen.github.io/pathfindR/
Other
178 stars 25 forks source link

pathfindR: Gene set analysis mmu_kegg-> enrichment() output: NULL #54

Closed choijamtsmunkhzul closed 4 years ago

choijamtsmunkhzul commented 4 years ago

Hello egeulgen

Thank you so much for helping me to understand for using this package.

However I got into some problem with pathfindR enrichment() function:

So i have only mouse gene lists: Here is my input data:

genes <- read.table("genes.txt", header=TRUE)
head(genes)
#                    genes
#1                 Cdc37l1
#2                  Slc4a4
#3                  Smurf1
#4                 Akap17b
class(genes)
#[1] "data.frame"

dim(genes)
#[1] 2355    1

As you can see i have higher number of mouse single gene sets.

Then after that as you suggested i used fetch_gene():

fetch_gene <- fetch_gene_set(
  gene_sets = "mmu_KEGG", 
  min_gset_size = 10,
  max_gset_size = 300,
  custom_genes = NULL,
  custom_descriptions = NULL)

lets check "term descriptions" and "genes by term"

kegg_descriptions <- fetch_gene$term_descriptions
head(kegg_descriptions)
#mmu00010                                   mmu00020                                   mmu00030                                   mmu00040 
#"Glycolysis / Gluconeogenesis"                "Citrate cycle (TCA cycle)"                "Pentose phosphate pathway" "Pentose and glucuronate interconversions" 

kegg_genes <- fetch_gene$genes_by_term
head(kegg_genes)
#$mmu00010
#[1] "HK2"      "HK3"      "HK1"      "HKDC1"    "GCK"      "GPI1"     "PFKL"     "PFKM"     "PFKP"     "FBP1"     "FBP2"     "ALDOA"    "ALDOB"    "ALDOC"    "ALDOART1" "TPI1"     "GAPDH"    "GAPDHS"  

Looks good, Now lets use enrichment()

result_df1 <- enrichment(input_genes = genes$genes,
                        genes_by_term = kegg_genes,
                        term_descriptions = kegg_descriptions,
                        adj_method = "bonferroni", 
                        enrichment_threshold = 0.05, 
                        sig_genes_vec = genes$genes,
                        background_genes = unique(unlist(kegg_genes)))
# Output: NULL

But every time i run i always gets NULL, is it possible? even though I have larger gene sets?

Also i tried other way:

enrichment(
  input_genes =genes$genes,
  genes_by_term = pathfindR.data::mmu_kegg_genes,
  term_descriptions = pathfindR.data::mmu_kegg_descriptions,
  adj_method = "bonferroni",
  enrichment_threshold = 0.05,
  sig_genes_vec = genes$genes,
  background_genes= unique(unlist(pathfindR.data::mmu_kegg_genes)))
#Output: NULL

Is it normal to have NULL output even though i have large gene set?

Thank you so much

egeulgen commented 4 years ago

hey @choijamtsmunkhzul, do you mind sharing the input data? It is possible that no enrichment for any pathway was found because the background genes is not large enough

choijamtsmunkhzul commented 4 years ago

Hello, Sure,

Here is input data, mice gene set: input_genes_for_mmuKEGG.rds compressed as ZIP file. input_genes_for_mmuKEGG.zip

Thank you

egeulgen commented 4 years ago

I've tried a larger background gene set (the mmu STRING PIN genes) but there is no significant enrichment for any pathways after p-value adjustment. One other point: for not mismatching any gene symbols, pathfindR (run_pathfindR) turns all symbols to upper case so I did that manually.

To obtain results for this dataset, I suggest (a) using a different correction method (tried but did not work) (b) changing the enrichment threshold to be less stringent or (c) both.

Hope this helps, -E

library(pathfindR)

input_data <- readRDS("misc/issues/issue54/input_genes_for_mmuKEGG.RDS")
input_vec <- toupper(input_data$genes)

fetch_gene <- fetch_gene_set(
  gene_sets = "mmu_KEGG",
  min_gset_size = 10,
  max_gset_size = 300)
mmu_descriptions <- fetch_gene$term_descriptions
mmu_genes <- fetch_gene$genes_by_term

sif_path <- return_pin_path("mmu_STRING")
pin_df <- read.delim(sif_path, header = FALSE)
bg_genes_vec <- unique(c(pin_df$V1, pin_df$V3))

result_df1 <- enrichment(input_genes = input_vec,
                         genes_by_term = mmu_genes,
                         term_descriptions = mmu_descriptions,
                         adj_method = "bonferroni",
                         enrichment_threshold = 0.05,
                         sig_genes_vec = input_vec,
                         background_genes = bg_genes_vec)