Open gimoAI opened 1 year ago
Task 2: The Re-Rank Continuous AI Reprioritization can be enabled in the settings. The re-ranking is then triggered every 25 references (http://v2dis-help.evidencepartners.com/1/en/topic/using-re-rank-in-your-project?q=Re-Rank). Thus, it should not be possible to re-rank before the first set of 25 records has been screened. Task 4: “DistillerAI uses a SVM (support vector machine) non-probabilistic binary linear classifier to make its predictions. This is a natural language processing classifier.“ (http://v2dis-help.evidencepartners.com/1/en/topic/ai-preview-and-rank?q=SVM) Task 5: One element of the Classifier Statistics Screen is a Balanced Accuracy Score, that is more indicative for the performance of a classifier, if classes are highly imbalanced (http://v2dis-help.evidencepartners.com/1/en/topic/validating-classifiers?q=balanced) Task 6 (I hope that this is meant by query strategies): An overview of the possible strategies can be found here: http://v2dis-help.evidencepartners.com/1/en/topic/searching-curatorcr-repositories
Thank you for providing the additional information on DistillerSR. I've created a Pull Request to incorporate your suggestions and update the repository accordingly. You can review the proposed changes here: https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/pull/58.
Task 5: One element of the Classifier Statistics Screen is a Balanced Accuracy Score, that is more indicative for the performance of a classifier, if classes are highly imbalanced (http://v2dis-help.evidencepartners.com/1/en/topic/validating-classifiers?q=balanced)
Thank you for pointing that out. Although the reference provided information on Balanced Accuracy Score, it doesn't explicitly mention the exact balancing strategy implemented in DistillerSR. Or am I missing some information?
Task 5: One element of the Classifier Statistics Screen is a Balanced Accuracy Score, that is more indicative for the performance of a classifier, if classes are highly imbalanced (http://v2dis-help.evidencepartners.com/1/en/topic/validating-classifiers?q=balanced)
It seems that the information provided on the website you referenced pertains to searching within CuratorCR Repositories, rather than the query strategies utilized for determining the order in which texts are presented to the user during the screening process. For instance, when a certainty-based query strategy is chosen, documents are displayed in the order of relevance score, with the most likely relevant document shown first. Or, when a random strategy is employed, documents are presented in a random order, completely disregarding the model output. Unfortunately, DistillerSR's documentation does not appear to provide explicit information on these types of query strategies.
After reviewing the documentation you provided, it appears that DistillerSR employs either random or certainty-based query strategies (without other mixed options or clustering). So, I have added random (when the re-rank feature is turned off) and certainty-based as the available query strategies in DistillerSR.
There are a couple of features/properties about DistillerSR that I was unable to find in literature and/or documentation:
Re-Rank References Now
. But it is not clear if this can be used before 25 or 2% of the training data has been screened. The button is documented here.