Question about the Hierarchical classification layout

bio-ontology-research-group / deepgo

Function prediction using a deep ontology-aware classifier

http://deepgo.bio2vec.net

79 stars 24 forks source link

Question about the Hierarchical classification layout #16

Closed datduong closed 5 years ago

datduong commented 5 years ago

In section 2.6, this layer is described as: Each network consists of one fully connected layer with a sigmoid activation function, and takes as an input the output of first fully connected layer.

I am quite confused about this setup. I tried to read the code, but was still unable to understand. Would you be able to clarify the model?

I understand that the input protein is transformed into a vector P of length 1024. Let's assume a simple scenario, where we have GO1 GO2, where GO1 is parent of GO2.

You can easily fit a dense neural model f2, so that we have f2(P) to predict if GO2 is assigned to P. Next, for GO1, what are the exact steps? According to the statement takes as an input the output of first fully connected layer., do we have f1 as the dense layer for GO1? Do we fit f1( f2(P) ) ? f2(P) produces a single number range 0 to 1. To me, fitting f1( f2(P) ) does not make sense, because the input of f1 is single number.

Thanks for your help.

coolmaksat commented 5 years ago

Hi, Parent classes are connected to their children with maximum merge layers. So the output of the GO1 will be something like this: max(f1(P), f2(P))

datduong commented 5 years ago

Thanks for reply. I have a follow up question. In your paper, table 1 has result of "selected" GO terms and "all" GO terms. Would you be able to clarify how did you compute the prediction for all the GO terms? Going back to the example where GO1 is parent of GO2, let's suppose only GO1 was selected because it occurs frequently. Now, to predict if protein P has GO1, it is easy because we have true label, and so we can train the model max(f1(P), f2(P)) well. For GO2, would you simply use f2(P) to predict of P has GO2 ?

Thanks.

coolmaksat commented 5 years ago

Hi, If GO2 was not selected then it means that our model cannot predict it. When we evaluate all GO terms we would consider GO2 class annotations as false negatives. We have a solution to this problem and to some other limitations of DeepGO in our new model. Checkout https://github.com/bio-ontology-research-group/deepgoplus

datduong commented 5 years ago

Thanks for the new link.

I still don't understand the part consider GO2 class annotations as false negatives. My current understanding is as follow: So suppose, under the ground truth, P has both GO1 and GO2. You train the model using only the label GO1 for P. During prediction, you would use f2(P) to predict if P has GO2, even though you never train on this relationship P-GO2. If this prediction is "no", then you consider this prediction as false negative. Is my understanding correct?

Thanks.

coolmaksat commented 5 years ago

Your understanding is correct, but during the prediction I won't use f2(P) because I did not train such function. And, since I did not predict GO2 the prediction will be "no" and it will be counted as false negative.

datduong commented 5 years ago

Hi, sorry for another question. I am confused again. When you predict on all the GO terms, you will still need to make a prediction if P has GO2 (even when GO2 was excluded from the training data because GO2 has low occurrence frequency). So, you have to use f2(P). Do you always consider the prediction f2(P) is "false negative" even if f2(P) is large (like 0.9) ? Thanks for your help.

coolmaksat commented 5 years ago

Hi, Sorry for confusing you). The thing is that we don't predict all GO terms, but we have to consider all GO terms when we evaluate our model. That is why we count the terms that we cannot predict as false negatives.

datduong commented 4 years ago

Sorry for coming back to this question. I am trying to apply this method on Cafa data in pytorch because I have other components already in pytorch. For MF ontology, during training, the loss function takes only in about 590 labels. However, to estimate a predicted probability for one of these 590 labels, we need to use all the MF terms (which is about 10,000 or so). Is my understanding correct ?

I couldn't fit the model in 1 GPU size 11GB, how many GPUs did you need to use ? Thanks. Happy new year.

coolmaksat commented 4 years ago

Hi, I don't think that you need to use all labels to estimate probability for a class. You use all labels when you evaluate final performance. If I remember correctly, all three models take around 16Gb of memory.