GangLiLab / genekitr

🧬 Gene analysis toolkit based on R
https://www.genekitr.fun
GNU General Public License v3.0
53 stars 7 forks source link

plotGSEA classic type for non-model species #15

Closed Natpod closed 1 year ago

Natpod commented 1 year ago

Describe the bug Hello, I was planning to coerce a fgsea (preranked gsea) result onto a plotEnrich function for plotting, with a previous step of gene count and GeneRatio calculation, geneID_symbol mapping and column name changes so that the dataframe looked identical to the model dataframe returned by genGSEA (which I find less flexible than fgsea).

However, when I attempted plotting the results for a single category which I checked was in the gsea_df, i recieved an error

Error in `$<-.data.frame`(`*tmp*`, "gene", value = c("BnaA04g22070D", : 
replacement has 64949 rows, data has 65732

With the following traceback:

8.
stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
"replacement has %d rows, data has %d"), N, nrows), domain = NA)
7.
`$<-.data.frame`(`*tmp*`, "gene", value = c("BnaA04g22070D", 
NA, NA, NA, NA, "BnaC01g43250D", NA, NA, NA, NA, "BnaC07g39370D", 
"BnaA03g47170D", NA, NA, "BnaCnng19060D", NA, NA, "BnaC01g19310D", 
NA, NA, "BnaC07g50360D", NA, NA, NA, "BnaA05g03390D", NA, NA, ...
6.
`$<-`(`*tmp*`, "gene", value = c("BnaA04g22070D", NA, NA, NA, 
NA, "BnaC01g43250D", NA, NA, NA, NA, "BnaC07g39370D", "BnaA03g47170D", 
NA, NA, "BnaCnng19060D", NA, NA, "BnaC01g19310D", NA, NA, "BnaC07g50360D", 
NA, NA, NA, "BnaA05g03390D", NA, NA, NA, NA, NA, NA, NA, NA, ...
5.
calcScore(geneset, genelist, x, exponent, fortify = TRUE, org)
4.
FUN(X[[i]], ...)
3.
lapply(show_pathway, function(x) {
calcScore(geneset, genelist, x, exponent, fortify = TRUE, 
org)
})
2.
do.call(rbind, lapply(show_pathway, function(x) {
calcScore(geneset, genelist, x, exponent, fortify = TRUE, 
org)
}))
1.
genekitr::plotGSEA(BP_HDAC_list, plot_type = "classic", show_pathway = "GO:0040029")

To Reproduce reprex_plotGSEA_filtered.xlsx

This is my excel file representing the list of different dataframes I used after preprocessing (with names "gsea_df", "genelist", "geneset", "exponent" and "org" . I am working with Brassica napus external_gene_name ENA identifiers

I filtered the gsea_result to having only 21 rows in order to preserve confidenciality of my results, but it still has the identifier Im looking forward to create a GSEA plot from, GO:0040029. If this is a problem for test generation, please confirm.

Additional context Any other supplements?

Natpod commented 1 year ago

It seems that the problem in calcScore is that it does not recognize many gene symbol identifiers from my ranked expression dataset, so when it reasigns the gene symbols to the ES score df it sets off a data.frame assignment error...

The thing is that I do not want any ID conversion following the score calculations whatsoever, because many genes from my dataset will be discarded in this conversion step for my non-model organism (Brassica napus). This is because there are not many cross annotations between annotated genes in ENA / EMBL databases and NCBI databases for this organism.

Is there any way you can add a functionality that avoids this automatic ID conversion to gene symbols?

Natpod commented 1 year ago

a proof of this is that when I try plotting by "fgsea" plots, the gene ranks are plotted correctly without problem. So i'm guessing it has to do with that traceback step of ID conversion, as the 'fgsea' plot functionality does not involve this step

Thank you

reedliu commented 1 year ago

Fixed, please use version >= 1.2.0.

plotGSEA(gse, plot_type = "classic", 
     show_pathway = c('GO:0045490','GO:1901136'), 
     show_gene = c('BnaA06g28350D','BnaA01g34980D'))
image
Natpod commented 1 year ago

Splendid, thank you for the fast response!