Closed joannacknight closed 6 months ago
What do we want as the output of the evaluation script?
The predictions files will contain individual predictions for each record (in the dev
or test
dataset - as appropriate) for both logits and also the score with the activation function applied.
Should the evaluation just output the MCC, precision, recall, accuracy, and F1?
As discussed this morning, evaluation includes the metrics MCC, precision, recall, accuracy and F1. Will add plots at a later stage if we want them for the report.
@radka-j - the code is ready for your review, then can merge into main, close this down and move onto issue #60
Some extra functionality I think we need - to discuss:
We also need to do evaluation for different binarisation thresholds (picking best threshold on the validation data)
We have a script to do this now for a threshold of 0.5.
I've created #84 to follow on from this to evaluate over different thresholds.
I'll log progress of generating test results in #60
We have the ability to make predictions using an existing checkpoint. These predictions are the output of the model without the sigmoid activation function being applied. We need to add functionality to turn these outputs into binary predictions and calculate the MCC and other metrics.