Sorry for opening a new issue, but you closed it before I had the time to reply. I'm copying the previous text and my follow up below.
I'm having some issues with the output of mgeneSim (but also mclusterSim).
I'm trying to make a GO similarity analysis on some proteins in an all against all fashion (which for my understanding of the documentation is done by using either mgeneSim or mclusterSim. Problem is, not all the proteins are present in the output matrix.
With the same results.
Can you tell me if I'm doing something wrong?
To which you replied :
not all genes/proteins have GO annotation.
Yes sure, not all proteins are annotated. But everyone in the example does.
The one that is left out from the matrix(Q5CZC0) has only one term but still has it: GO:0005739.
I also checked with other proteins with a single GO term, but the error is not reproducible.
Here's the full list of protein I was testing my pipeline:
They all have GO terms. Only Q5CZC0Q9Y6R7 and P0DPF2 has a single GO entry.
by using all of them (36) as in sim_matrix<-mgeneSim(b,b, semData=hsGO2, measure="Wang", combine="BMA", verbose=FALSE) where b is the the full list above, I get a 30by30 matrix.
Sorry for opening a new issue, but you closed it before I had the time to reply. I'm copying the previous text and my follow up below.
To which you replied :
Yes sure, not all proteins are annotated. But everyone in the example does. The one that is left out from the matrix(Q5CZC0) has only one term but still has it: GO:0005739.
I also checked with other proteins with a single GO term, but the error is not reproducible. Here's the full list of protein I was testing my pipeline:
Q8NF91" "Q8WXH0" "Q63HN8" "P21817" "Q92736" "A2VEC9" "Q8WZ42" "Q2LD37" "Q8NEZ4" "P58107" "Q5CZC0" "Q6V0I7" "Q9Y6R7" "Q96RW7" "Q8NDA2" "Q4G0P3" "Q8WXG9" "Q9UPN3" "O14686" "Q03001" "P0DPF2" "Q9NU22" "Q5VST9" "Q9Y6V0" "P98088" "Q9HC84" "Q8WXI7" "Q7Z5P9" "Q02817" "Q9UKN1" "P20929" "O75445" "Q5T4S7" "Q8IVF2" "Q09666" "Q86UQ4"
They all have GO terms. Only
Q5CZC0
Q9Y6R7
andP0DPF2
has a single GO entry.by using all of them (36) as in
sim_matrix<-mgeneSim(b,b, semData=hsGO2, measure="Wang", combine="BMA", verbose=FALSE)
where b is the the full list above, I get a 30by30 matrix.