YuLab-SMU / GOSemSim

:golf: GO-terms Semantic Similarity Measures
https://yulab-smu.top/biomedical-knowledge-mining-book/
58 stars 26 forks source link

buildGOmap issues, therefore 2 suggestions #47

Open guidohooiveld opened 7 months ago

guidohooiveld commented 7 months ago

@GuangchuangYu , @huerqiang

At the Bioconductor support forum an issue/error was reported regarding the function buildGOmap: https://support.bioconductor.org/p/9156358/

I had a quick look at it, and it seems this is because a) the required input for buildGOmap seems to be counter-intuitive, and b) buildGOmap explicitly expects as input a column labelled "GO".

Regarding a): The required input for buildGOmap (1st column should be the geneids, 2nd column the GOIDs) seems to be counter-intuitive because for the generic enrichment functions enricher and GSEA the reverse order is rather required for the input TERM2GENE (thus 1st column the GOIDs, and 2nd column the geneids)... Maybe good to align this, or at least explain it better at the help page? Also add an example on the help page?

Regarding b): The function buildGOmap_internal has hard-coded the requirement that the column with the GOIDs should be labelled GO: https://github.com/YuLab-SMU/GOSemSim/blob/1800f404145ac9788685db07f3d9ad6c70f65cc3/R/buildGOmap.R#L41

and

https://github.com/YuLab-SMU/GOSemSim/blob/1800f404145ac9788685db07f3d9ad6c70f65cc3/R/buildGOmap.R#L45

This requirement is not stated at the help page of buildGOmap, so could you please add that? It now results in the reported error.

Thanks, G

FWIW: I have attached the GO mapping file that was used on the Bioconductor support forum, and thus gave issues. It was downloaded from https://www.pseudomonas.com/goterms/list (as csv).

gene_ontology_csv.csv

GuangchuangYu commented 7 months ago

@guidohooiveld thanks and please test with the github version.

guidohooiveld commented 7 months ago

@GuangchuangYu : thanks for having a look at this so quickly; much appreciated.

After updating it works fine; thanks!

I have also edited my answer on the Bioconductor support forum to include a link to this thread.

> BiocManager::install(c('YuLab-SMU/GOSemSim'), force=TRUE)
Bioconductor version 3.18 (BiocManager 1.30.22), R 4.3.0 (2023-04-21 ucrt)
Installing github package(s) 'YuLab-SMU/GOSemSim'
Downloading GitHub repo YuLab-SMU/GOSemSim@HEAD
<<snip>>
>
>
> library(clusterProfiler)
> packageVersion("GOSemSim")
[1] ‘2.29.1.1’
> 
> Pa_GO <- read.csv("gene_ontology_csv.csv")
> Pa_GOterms <- Pa_GO[c(5,1)]
> 
> ## check;
> ## note that order of columns now aligns with those as in TERM2GENE,
> ## and that names did NOT have to be changed!
> colnames(Pa_GOterms)
[1] "Accession" "Locus.Tag"
> dim(Pa_GOterms)
[1] 15883     2
> head(Pa_GOterms)
   Accession Locus.Tag
1 GO:0005524    PA0001
2 GO:0006270    PA0001
3 GO:0006275    PA0001
4 GO:0016887    PA0001
5 GO:0016887    PA0001
6 GO:0006260    PA0001
> tail(Pa_GOterms)
       Accession Locus.Tag
15878 GO:0008033    PA5569
15879 GO:0001682    PA5569
15880 GO:0004526    PA5569
15881 GO:0003735    PA5570
15882 GO:0005840    PA5570
15883 GO:0006412    PA5570
> 
> Pa_GOMap <- buildGOmap(Pa_GOterms)
> ## check; note list is longer and 'tail' showes additional GO IDs.
> dim(Pa_GOMap)
[1] 119221      2
> head(Pa_GOMap)
   Accession Locus.Tag
1 GO:0005524    PA0001
2 GO:0006270    PA0001
3 GO:0006275    PA0001
4 GO:0016887    PA0001
5 GO:0016887    PA0001
6 GO:0006260    PA0001
> tail(Pa_GOMap)
        Accession Locus.Tag
183988 GO:0044249    PA5570
183989 GO:0044271    PA5570
183990 GO:0071704    PA5570
183991 GO:1901564    PA5570
183992 GO:1901566    PA5570
183993 GO:1901576    PA5570
> 
>