DistillerSR: missing information

gimoAI commented 1 year ago

There are a couple of features/properties about DistillerSR that I was unable to find in literature and/or documentation:

[ ] It is possible to export inclusions or exclusions as separate files, but is it also possible to export a single file with labeling decisions? If yes, is it possible to re-import those back into DistillerSR, retaining the labeling decisions?
[x] Concerning minimum training data: there is also a button in DistillerSR to Re-Rank References Now. But it is not clear if this can be used before 25 or 2% of the training data has been screened. The button is documented here.
[ ] Which feature extraction method is used for the re-ranking?
[x] Which classifier is used for the re-ranking?
[ ] Which balancing strategies are available?
[x] Which query strategies are available?

Tanja19zpid commented 1 year ago

Task 2: The Re-Rank Continuous AI Reprioritization can be enabled in the settings. The re-ranking is then triggered every 25 references (http://v2dis-help.evidencepartners.com/1/en/topic/using-re-rank-in-your-project?q=Re-Rank). Thus, it should not be possible to re-rank before the first set of 25 records has been screened. Task 4: “DistillerAI uses a SVM (support vector machine) non-probabilistic binary linear classifier to make its predictions. This is a natural language processing classifier.“ (http://v2dis-help.evidencepartners.com/1/en/topic/ai-preview-and-rank?q=SVM) Task 5: One element of the Classifier Statistics Screen is a Balanced Accuracy Score, that is more indicative for the performance of a classifier, if classes are highly imbalanced (http://v2dis-help.evidencepartners.com/1/en/topic/validating-classifiers?q=balanced) Task 6 (I hope that this is meant by query strategies): An overview of the possible strategies can be found here: http://v2dis-help.evidencepartners.com/1/en/topic/searching-curatorcr-repositories

Rensvandeschoot commented 1 year ago

Thank you for providing the additional information on DistillerSR. I've created a Pull Request to incorporate your suggestions and update the repository accordingly. You can review the proposed changes here: https://github.com/Rensvandeschoot/software-overview-machine-learning-for-screening-text/pull/58.

Rensvandeschoot commented 1 year ago

Task 5: One element of the Classifier Statistics Screen is a Balanced Accuracy Score, that is more indicative for the performance of a classifier, if classes are highly imbalanced (http://v2dis-help.evidencepartners.com/1/en/topic/validating-classifiers?q=balanced)

Thank you for pointing that out. Although the reference provided information on Balanced Accuracy Score, it doesn't explicitly mention the exact balancing strategy implemented in DistillerSR. Or am I missing some information?

Rensvandeschoot commented 1 year ago

Task 5: One element of the Classifier Statistics Screen is a Balanced Accuracy Score, that is more indicative for the performance of a classifier, if classes are highly imbalanced (http://v2dis-help.evidencepartners.com/1/en/topic/validating-classifiers?q=balanced)

It seems that the information provided on the website you referenced pertains to searching within CuratorCR Repositories, rather than the query strategies utilized for determining the order in which texts are presented to the user during the screening process. For instance, when a certainty-based query strategy is chosen, documents are displayed in the order of relevance score, with the most likely relevant document shown first. Or, when a random strategy is employed, documents are presented in a random order, completely disregarding the model output. Unfortunately, DistillerSR's documentation does not appear to provide explicit information on these types of query strategies.

After reviewing the documentation you provided, it appears that DistillerSR employs either random or certainty-based query strategies (without other mixed options or clustering). So, I have added random (when the re-rank feature is turned off) and certainty-based as the available query strategies in DistillerSR.

Rensvandeschoot / software-overview-machine-learning-for-screening-text

DistillerSR: missing information #54