Closed benedettafrida closed 6 years ago
not all genes/proteins have GO annotation.
Hi, thanks for responding so promptly.
Yes sure, not all proteins are annotated. But everyone in the example does. The one that is left out from the matrix(Q5CZC0) has only one term but still has it: GO:0005739.
I also checked with other proteins with a single GO term, but the error is not reproducible. Here's the full list of protein I was testing my pipeline:
Q8NF91" "Q8WXH0" "Q63HN8" "P21817" "Q92736" "A2VEC9" "Q8WZ42" "Q2LD37" "Q8NEZ4" "P58107" "Q5CZC0" "Q6V0I7" "Q9Y6R7" "Q96RW7" "Q8NDA2" "Q4G0P3" "Q8WXG9" "Q9UPN3" "O14686" "Q03001" "P0DPF2" "Q9NU22" "Q5VST9" "Q9Y6V0" "P98088" "Q9HC84" "Q8WXI7" "Q7Z5P9" "Q02817" "Q9UKN1" "P20929" "O75445" "Q5T4S7" "Q8IVF2" "Q09666" "Q86UQ4"
They all have GO terms. Only Q5CZC0
Q9Y6R7
and P0DPF2
has a single GO entry.
by using all of them (36) as in sim_matrix<-mgeneSim(b,b, semData=hsGO2, measure="Wang", combine="BMA", verbose=FALSE)
where b is the the full list above, I get a 30by30 matrix.
for Q5CZC0 Q9Y6R7 and P0DPF2, only the second one has 1 GO term annotated.
> geneSim(gene[1], gene[1], hsGO2)
[1] NA
> geneSim(gene[2], gene[2], hsGO2)
$geneSim
[1] 1
$GO1
[1] "GO:0005515"
$GO2
[1] "GO:0005515"
> geneSim(gene[3], gene[3], hsGO2)
[1] NA
> gene
[1] "Q5CZC0" "Q9Y6R7" "P0DPF2"
I see. Then there's a discrepancy between what is retrieved from UniProtKB/SwissProt and method you're using for fetching GO terms.
Here's what I downloaded from Uniprot. fromUniprot.txt
Is there a way of making multiple genes comparisons directly by inputting the GO terms? In the documentation, I only saw mgoSim which allows pairwise comparison (if I understood correctly).
Yes, you can use mgoSim
Hi, I'm having some issues with the output of mgeneSim (but also mclusterSim). I'm trying to make a GO similarity analysis on some proteins in an all against all fashion (which for my understanding of the documentation is done by using either mgeneSim or mclusterSim. Problem is, not all the proteins are present in the output matrix.
Here is my code:
the length of uniprotIDs =11 and using this code
sqrt(length(sim_matrix))=10
I've also tried using:
sim_matrix<-mgeneSim(uniprotIDs, semData=hsGO2, measure="Wang", combine="BMA", verbose=FALSE)
But in this case
sqrt(length(sim_matrix))=9
To double check that the problem didn't lie in the use of UniprotIDs I also tried using GeneIDs using the following code:
With the same results. Can you tell me if I'm doing something wrong?