EPPI-Reviewer: missing information

gimoAI commented 2 years ago

There are a couple of features/properties about EPPI-reviewer that I was unable to find in literature and/or documentation:

[ ] I believe no balancing method is used since all machine learning methodology is quite thoroughly described, but it is not mentioned explicitly.
[ ] Does the export file contain the rank order of the unseen records (yes/no)?
[ ] It is possible to export inclusions or exclusions as separate files, but is it also possible to export a single file with labeling decisions? If yes, is it possible to re-import those back into EPPI-reviewer, retaining the labeling decisions?
[ ] For the supervised model, where users can build there own classifier, the documentation says "this function enables you to build a linear classifier from a bag of words representation of your studies, using the scikit-learn python library)". However, which linear classifier is not explicitly mentioned. I assume a SVM is used since that is also used for vector learning, but this needs to be verified.

A lot of documentation can be found in the EPPI-Reviewer v4.8 manual

Tanja19zpid commented 1 year ago

Regarding task 4: In the documentation on machine learning in EPPI (https://eppi.ioe.ac.uk/CMS/Portals/35/machine_learning_in_eppi-reviewer_v_7_web_version.pdf) it says: "The algorithm we use is a support vector machine as implemented in the Scikit-Learn Python machine library." The SVM classifiers in this package seem to be able to handle imbalanced data (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.balanced_accuracy_score.html), which might also be helpful for task 1.

Rensvandeschoot commented 1 year ago

good point! SVM can handle imbalanced data by introducing different weights to the classes or using different cost-sensitive learning techniques. You can use the "class_weight" parameter available in Scikit-learn's SVM implementation. By setting this parameter to "balanced," the algorithm automatically adjusts the weights inversely proportional to the class frequencies. But do we know if this is the case in EPPI reviewer?

Rensvandeschoot / software-overview-machine-learning-for-screening-text

EPPI-Reviewer: missing information #21