Rensvandeschoot / software-overview-machine-learning-for-screening-text

The repository aims to create an overview and comparison of software used for systematically screening large amounts of textual data using machine learning.
Creative Commons Attribution 4.0 International
12 stars 7 forks source link

DistillerSR: missing information #54

Open gimoAI opened 1 year ago

gimoAI commented 1 year ago

There are a couple of features/properties about DistillerSR that I was unable to find in literature and/or documentation:

Tanja19zpid commented 1 year ago

Task 2: The Re-Rank Continuous AI Reprioritization can be enabled in the settings. The re-ranking is then triggered every 25 references (http://v2dis-help.evidencepartners.com/1/en/topic/using-re-rank-in-your-project?q=Re-Rank). Thus, it should not be possible to re-rank before the first set of 25 records has been screened. Task 4: “DistillerAI uses a SVM (support vector machine) non-probabilistic binary linear classifier to make its predictions. This is a natural language processing classifier.“ (http://v2dis-help.evidencepartners.com/1/en/topic/ai-preview-and-rank?q=SVM) Task 5: One element of the Classifier Statistics Screen is a Balanced Accuracy Score, that is more indicative for the performance of a classifier, if classes are highly imbalanced (http://v2dis-help.evidencepartners.com/1/en/topic/validating-classifiers?q=balanced) Task 6 (I hope that this is meant by query strategies): An overview of the possible strategies can be found here: http://v2dis-help.evidencepartners.com/1/en/topic/searching-curatorcr-repositories

Rensvandeschoot commented 1 year ago

Thank you for providing the additional information on DistillerSR. I've created a Pull Request to incorporate your suggestions and update the repository accordingly. You can review the proposed changes here: https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/pull/58.

Rensvandeschoot commented 1 year ago

Task 5: One element of the Classifier Statistics Screen is a Balanced Accuracy Score, that is more indicative for the performance of a classifier, if classes are highly imbalanced (http://v2dis-help.evidencepartners.com/1/en/topic/validating-classifiers?q=balanced)

Thank you for pointing that out. Although the reference provided information on Balanced Accuracy Score, it doesn't explicitly mention the exact balancing strategy implemented in DistillerSR. Or am I missing some information?

Rensvandeschoot commented 1 year ago

Task 5: One element of the Classifier Statistics Screen is a Balanced Accuracy Score, that is more indicative for the performance of a classifier, if classes are highly imbalanced (http://v2dis-help.evidencepartners.com/1/en/topic/validating-classifiers?q=balanced)

It seems that the information provided on the website you referenced pertains to searching within CuratorCR Repositories, rather than the query strategies utilized for determining the order in which texts are presented to the user during the screening process. For instance, when a certainty-based query strategy is chosen, documents are displayed in the order of relevance score, with the most likely relevant document shown first. Or, when a random strategy is employed, documents are presented in a random order, completely disregarding the model output. Unfortunately, DistillerSR's documentation does not appear to provide explicit information on these types of query strategies.

After reviewing the documentation you provided, it appears that DistillerSR employs either random or certainty-based query strategies (without other mixed options or clustering). So, I have added random (when the re-rank feature is turned off) and certainty-based as the available query strategies in DistillerSR.