Closed jmsmkn closed 5 years ago
cc: @xarion
Hey, I am copying a case that I would have used.
The results.csv
contains my results. It has columns;
Image ID
- The filename
Annotation
- The annotation by a radiologist
Dataset Label
- The labels provided by the dataset
Dataset-Annotation Agreement
- The agreement between the labels and the annotations
Prediction
- The model prediction class ID
Corrected
- If the model prediction agrees with the radiologist but not with the dataset
Worsened
- If the model prediction disagrees with both the radiologist and the dataset
Confidence
- Confidence of the model for this data
Uncertainty
- Uncertainty of the model for this data
It also contains the relative png files mentioned in the Image ID
field.
What I do with this file is, sort it by the confidence and uncertainty, to see what kind of images have high/low confidence/uncertainty and see their states on "corrected" and "worsened". Hope this helps, thanks a bunch. erdi-sample-data.zip
Hi Erdi
Are the definitions for Corrected and Worsened correct?
Take row 2 of the results: annotation is 0 (radiologist), dataset label is 0 (dataset), prediction is 0.
The prediction agrees with the radiologist and the dataset, given the definitions above:
corrected
should be zero. In the table corrected is 1.worsened
should be zero, it is. But there are other cases where the value of worsened does not agree with your definition above (eg, row 3, where the prediction disagrees with the radiologist but agrees with the dataset, so worsened should be zero but it has the value of 1)
Ps, you can see my logic in this commit: https://github.com/DIAGNijmegen/rse-static-report-test/commit/930d16234ef7e1a799d17d46f8fc429c910a04f3
This results in:
"Corrected": {
"0": 0.0,
"1": 0.0,
"2": 0.0,
"3": 0.0,
"4": 0.0,
"5": 0.0,
"6": 0.0,
"7": 0.0,
"8": 0.0,
"9": 0.0,
"10": 0.0,
"11": 0.0,
"12": 0.0,
"13": 0.0,
"14": 0.0,
"15": 0.0,
"16": 0.0,
"17": 0.0,
"18": 1.0
},
Hey James, your logic is correct. We only apply our method only to cases where the prediction does not match the dataset label. For these cases, we expect corrected and worsened to be 0.
The CXR group has been using the html reports mevislab module to generate static reports of algorithm results on datasets. They've found that this is difficult to work with, so Erdi is thinking about adding this functionality to either Evalutils or Grand Challenge.
If we were to do this in evalutils we could: