YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1k stars 253 forks source link

simplify for gseaResult objects #162

Open gwangjinkim opened 6 years ago

gwangjinkim commented 6 years ago

currently, simplify() is not realized for gseaResult objects (after gseGO()). Is there are good theoretical reason not to allow simplification of this result? Else I would love to see it realized for gseaResult objects ...

gwangjinkim commented 6 years ago

Dear Guangchuan,

Thank you for clusterProfiler! It is a great work!

To my issue above: I could help myself out with a dirty hack:

####################################################
# to make possible `simplify` on gseaResult object
# I just inserted following code into the script:
# And then, `simplify()` can be applied on `gseaResult` objects of GO GSEA analyses (`gseGO()`).

require(magrittr) # because of the %<>% operator which is a pipe %>% with reassigning back `<-`
setMethod("simplify", signature(x="gseaResult"),
          function(x, cutoff=0.7, by="p.adjust", select_fun=min, measure="Wang", semData=NULL) {
            if (!x@setType %in% c("BP", "MF", "CC"))
              stop("simplify only applied to output from enrichGO...")
            x@result %<>% simplify_internal(., cutoff, by, select_fun,
                                            measure, x@setType, semData)
            return(x)
          }
)

# from:
# https://github.com/GuangchuangYu/clusterProfiler/blob/master/R/simplify.R
# I added `packagename::` in front of some function names, since this code is outside the package
# and does not "see" some of the packages/package functions imported 
# to the package environment
# but basically this is the unchanged `simplify_internal` function definition.

simplify_internal <- function(res, cutoff=0.7, by="p.adjust", select_fun=min, measure="Rel", ontology, semData) {
  if (missing(semData) || is.null(semData)) {
    if (measure == "Wang") {
      semData <- GOSemSim::godata(ont = ontology)
    } else {
      stop("godata should be provided for IC-based methods...")
    }
  } else {
    if (ontology != semData@ont) {
      msg <- paste("semData is for", semData@ont, "ontology, while enrichment result is for", ontology)
      stop(msg)
    }
  }

  sim <- GOSemSim::mgoSim(res$ID, res$ID,
                semData = semData,
                measure=measure,
                combine=NULL)

  ## to satisfy codetools for calling gather
  go1 <- go2 <- similarity <- NULL

  sim.df <- as.data.frame(sim)
  sim.df$go1 <- row.names(sim.df)
  sim.df <- tidyr::gather(sim.df, go2, similarity, -go1)

  sim.df <- sim.df[!is.na(sim.df$similarity),]

  ## feature 'by' is attached to 'go1'
  sim.df <- merge(sim.df, res[, c("ID", by)], by.x="go1", by.y="ID")
  sim.df$go2 <- as.character(sim.df$go2)

  ID <- res$ID

  GO_to_remove <- character()
  for (i in seq_along(ID)) {
    ii <- which(sim.df$go2 == ID[i] & sim.df$similarity > cutoff)
    ## if length(ii) == 1, then go1 == go2
    if (length(ii) < 2)
      next

    sim_subset <- sim.df[ii,]

    jj <- which(sim_subset[, by] == select_fun(sim_subset[, by]))

    ## sim.df <- sim.df[-ii[-jj]]
    GO_to_remove <- c(GO_to_remove, sim_subset$go1[-jj]) %>% unique
  }

  res[!res$ID %in% GO_to_remove, ]
}

########################################

After that, it is possible to call simplify(gseGO.result, cutoff = 0.7, by = "p.adjust", select_fun = min).

GuangchuangYu commented 6 years ago

thanks for your effort. Will look into it.

gwangjinkim commented 6 years ago

Welcome!

And thanks for your effort to create clusterProfiler! And ChIPseeker and so many other super-useful repositories! Amazing!!

I have to learn a lot still in R and bioinformatics in general (just changed 3 years ago from wetlab to bioinformatics). On what kind of projects do you work nowadays? Could I contribute to sth? I need a mentor, I realized.

Best, Gwang-Jin

On Fri, Sep 14, 2018 at 10:04 AM Guangchuang Yu notifications@github.com wrote:

thanks for your effort. Will look into it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/clusterProfiler/issues/162#issuecomment-421266085, or mute the thread https://github.com/notifications/unsubscribe-auth/AfDrF9BHCmES8u5gd9O1URVhYlDg7yJ6ks5ua2MEgaJpZM4WlpIT .

GuangchuangYu commented 6 years ago

thanks @gwangjinkim.

You are welcome to contribute to my github repos.

gwangjinkim commented 6 years ago

Thank you! Sure!

A question because of the 'simplify()' function in 'clusterProfiler':

I found a way using GO.db to check, whether a GO term is terminal or not. https://support.bioconductor.org/p/35789/

I see the core of simplify() is the mgoSim() function. https://github.com/GuangchuangYu/clusterProfiler/blob/master/R/simplify.R

However, I see that its default for organism="human". If I enter mouse GO ids, will it simplify for "human"?

On Tue, Sep 18, 2018 at 4:38 AM Guangchuang Yu notifications@github.com wrote:

thanks @gwangjinkim https://github.com/gwangjinkim.

You are welcome to contribute to my github repos.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/clusterProfiler/issues/162#issuecomment-422234556, or mute the thread https://github.com/notifications/unsubscribe-auth/AfDrFx5tFiJy_1oxqRmuXsXST0SQUlKFks5ucFy9gaJpZM4WlpIT .

gwangjinkim commented 6 years ago

Sorry for my ignorance, but ... are GO ids of the mouse basically/thematically the same for humans? so that same number GO ids for mouse and human "mean" the same function annotated by GO? Or do different numbers mean different terms in the different species? I try to find an answer in Google ... but it takes time ...

On Fri, Sep 21, 2018 at 12:34 PM Gwang Jin Kim gwang.jin.kim.phd@gmail.com wrote:

Thank you! Sure!

A question because of the 'simplify()' function in 'clusterProfiler':

I found a way using GO.db to check, whether a GO term is terminal or not. https://support.bioconductor.org/p/35789/

I see the core of simplify() is the mgoSim() function. https://github.com/GuangchuangYu/clusterProfiler/blob/master/R/simplify.R

However, I see that its default for organism="human". If I enter mouse GO ids, will it simplify for "human"?

On Tue, Sep 18, 2018 at 4:38 AM Guangchuang Yu notifications@github.com wrote:

thanks @gwangjinkim https://github.com/gwangjinkim.

You are welcome to contribute to my github repos.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/clusterProfiler/issues/162#issuecomment-422234556, or mute the thread https://github.com/notifications/unsubscribe-auth/AfDrFx5tFiJy_1oxqRmuXsXST0SQUlKFks5ucFy9gaJpZM4WlpIT .

gwangjinkim commented 6 years ago

On Fri, Sep 21, 2018 at 12:41 PM Gwang Jin Kim gwang.jin.kim.phd@gmail.com wrote:

Sorry for my ignorance, but ... are GO ids of the mouse basically/thematically the same for humans? so that same number GO ids for mouse and human "mean" the same function annotated by GO? Or do different numbers mean different terms in the different species? I try to find an answer in Google ... but it takes time ...

On Fri, Sep 21, 2018 at 12:34 PM Gwang Jin Kim < gwang.jin.kim.phd@gmail.com> wrote:

Thank you! Sure!

A question because of the 'simplify()' function in 'clusterProfiler':

I found a way using GO.db to check, whether a GO term is terminal or not. https://support.bioconductor.org/p/35789/

I see the core of simplify() is the mgoSim() function. https://github.com/GuangchuangYu/clusterProfiler/blob/master/R/simplify.R

However, I see that its default for organism="human". If I enter mouse GO ids, will it simplify for "human"?

On Tue, Sep 18, 2018 at 4:38 AM Guangchuang Yu notifications@github.com wrote:

thanks @gwangjinkim https://github.com/gwangjinkim.

You are welcome to contribute to my github repos.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/clusterProfiler/issues/162#issuecomment-422234556, or mute the thread https://github.com/notifications/unsubscribe-auth/AfDrFx5tFiJy_1oxqRmuXsXST0SQUlKFks5ucFy9gaJpZM4WlpIT .

nikofleischer commented 3 years ago

is there any progress on this issue? i tried out the 'hack' but it seems to no longer work for the newest version :(

DarioS commented 1 month ago

Sigh.

> class(x)
  "enrichResult"
> x <- pairwise_termsim(x)
> simplify(x)
Error in .local(x, ...) : 
  simplify only applied to output from gsegO and enrichGO...