comic / evalutils

evalutils helps users create extensions for grand-challenge.org
https://grand-challenge.org
MIT License
23 stars 9 forks source link

Add static reports? #60

Closed jmsmkn closed 5 years ago

jmsmkn commented 6 years ago

The CXR group has been using the html reports mevislab module to generate static reports of algorithm results on datasets. They've found that this is difficult to work with, so Erdi is thinking about adding this functionality to either Evalutils or Grand Challenge.

If we were to do this in evalutils we could:

jmsmkn commented 6 years ago

cc: @xarion

xarion commented 6 years ago

Hey, I am copying a case that I would have used. The results.csv contains my results. It has columns; Image ID - The filename Annotation - The annotation by a radiologist Dataset Label - The labels provided by the dataset Dataset-Annotation Agreement - The agreement between the labels and the annotations Prediction - The model prediction class ID Corrected - If the model prediction agrees with the radiologist but not with the dataset Worsened - If the model prediction disagrees with both the radiologist and the dataset Confidence - Confidence of the model for this data Uncertainty - Uncertainty of the model for this data It also contains the relative png files mentioned in the Image ID field.

What I do with this file is, sort it by the confidence and uncertainty, to see what kind of images have high/low confidence/uncertainty and see their states on "corrected" and "worsened". Hope this helps, thanks a bunch. erdi-sample-data.zip

jmsmkn commented 6 years ago

Hi Erdi

Are the definitions for Corrected and Worsened correct?

Take row 2 of the results: annotation is 0 (radiologist), dataset label is 0 (dataset), prediction is 0.

The prediction agrees with the radiologist and the dataset, given the definitions above:

But there are other cases where the value of worsened does not agree with your definition above (eg, row 3, where the prediction disagrees with the radiologist but agrees with the dataset, so worsened should be zero but it has the value of 1)

jmsmkn commented 6 years ago

Ps, you can see my logic in this commit: https://github.com/DIAGNijmegen/rse-static-report-test/commit/930d16234ef7e1a799d17d46f8fc429c910a04f3

This results in:

"Corrected": {
            "0": 0.0,
            "1": 0.0,
            "2": 0.0,
            "3": 0.0,
            "4": 0.0,
            "5": 0.0,
            "6": 0.0,
            "7": 0.0,
            "8": 0.0,
            "9": 0.0,
            "10": 0.0,
            "11": 0.0,
            "12": 0.0,
            "13": 0.0,
            "14": 0.0,
            "15": 0.0,
            "16": 0.0,
            "17": 0.0,
            "18": 1.0
        },
xarion commented 6 years ago

Hey James, your logic is correct. We only apply our method only to cases where the prediction does not match the dataset label. For these cases, we expect corrected and worsened to be 0.