Clarify in docs and tutorials the `output` column in the results' dataset

(Apologies if the question is too obvious). After training and testing with the tutorial https://github.com/DeepRank/deeprank2/blob/main/tutorials/training.ipynb, I get to this:

>>> output_test
     phase  epoch                      entry                                     output  target      loss
0  testing   18.0  residue-ppi:M-P:BA-113208   [0.4746413230895996, 0.5253586769104004]     1.0  0.668939
1  testing   18.0  residue-ppi:M-P:BA-135488   [0.4774721562862396, 0.5225278735160828]     1.0  0.668939
2  testing   18.0  residue-ppi:M-P:BA-136144    [0.533593475818634, 0.4664064645767212]     0.0  0.668939
3  testing   18.0  residue-ppi:M-P:BA-114113   [0.4767359495162964, 0.5232640504837036]     0.0  0.668939

The target is defined (for ppi classification) as not binding (0) or binding (1), in the same tutorial. I thought the values in output were the result of the softmax (left for probability of not binding, right for prob of binding), but that does not match with the binary target column in the example above. So what are these values?

I thought the values in output were the result of the softmax (left for probability of not binding, right for prob of binding), but that does not match with the binary target column in the example above. So what are these values?

They are actually the result of a softmax, as you can see from here. The output column (output) contain a list with two elements as you thought indeed, respectively representing the predicted probabilities that the data point is 0 (first element of the list, representing non-binder class in the tutorial's example) and 1 (second element of the list, representing binder class).

The binary target column (target) is, as the name suggests, the target you aim at (true value, true label). If the output does not match the target, then the model is not good. In the 4 rows you reported above, the model is predicting the right outputs 3 times out of 4 if you take 0.5 as a threshold: entry 0 would be a 1 (0.52 probability is above 0.5 threshold, so the prediction would count as a 1), entry 1 would be a 1, and entry 2 would be a 0. Only entry 3 would be uncorrect, since a threshold of 0.5 would give a 1, which is not correct since the target in this case is 0. In general in the tutorials' cases you can't expect good results since we're using only 100 data points in total, and even less to train the models.

The reason for leaving the probabilities for all the classes (in this case only two classes, 0-class and 1-class) is that there are many different ways of computing metrics. Depending on the predictor's application, some users may be interested in the 0-class probability only, or in the 1-class probability only, and also the thresholds for deciding when to have a class or another one can be tuned in very different ways.

I will leave this issue open for clarifying this further in the tutorials and in the documentation.

I hope this clears up your question, but please let me know if that's not the case :)

DeepRank / deeprank2

Clarify in docs and tutorials the `output` column in the results' dataset #617