To replicate lesion-wise evaluation

Joeycho commented 4 months ago

Hi @TaWald,

I would like to check which information was crucial to replicate lesion-wise evaluation which is used for HD-BM paper.

What I have found in the supplement document is the following:

1) resampling to fulfill 1x1x1 mm, isotropic spacing 2) dilation to connect close enough.. voxels (radius=3, scikit-image v0.18.1) 3) Connected component analysis (Probably, I need more information about this step, such as connectivity..?) 4) Threshold for lesions (predicted lesions above L-Dice > 0.1 only count) 5) Report median (instead of mean) lesion-wise metrics per patient to balance the effect of the number of lesions in each patient and deal with non-normalities of the values 6) From the supplement document, does F1-score mean lesion-wise dice? And does dice mean case-wise dice? I was confused.

I have found another repository, which also reports lesion-wise metrics (https://github.com/rachitsaluja/BraTS-2023-Metrics).

However, I recognized the difference between HD-BM one and Brats 2023 in detail. That's how I ask you for more information or clarification.

TaWald commented 4 months ago

Hey Joeycho,

thanks for the inquiry! It's nice to see that people pay attention to these important details :) As you identified correctly Evaluations are often differing as there is not a real "right" way to create instances from semantic segmentation predictions, so there are different ways to go about it. Subsequently BraTS23 had it's own pipeline which neither is more or less correct than mine.

Regarding your specific questions:

resampling to fulfill 1x1x1 mm, isotropic spacing

This is optional, but can make your morphological kernels (e.g. the ball) the same distance along all the directions.

Connected component analysis [...]

In the supplement we specified the connectivity as 3D-ball kernel. This represents a connectivity along the edges, excluding the corner voxels of a 3D-Cube. In most libraries this would be connectivity '2' with (1=Faces, 2=edges, 3=corners).

4 Threshold for lesions (predicted lesions above L-Dice > 0.1 only count)

This is generally a matter of taste. Many people in the domain use thresholds of != 0, which means once a single voxel matches you have "detected" a lesion. I found this rather weak, especially for larger metastasis.

median (instead of mean) lesion-wise metrics [...]

This is also a matter of personal preferences (mean vs median that is). But on the other hand I would certainly recommend to aggregate patient-wise and then dataset-wise in order to have each patient count equally, as this IMHO is also the metric that matters in application later downstream.

From the supplement document, does F1-score mean lesion-wise dice? And does dice mean case-wise dice? I was confused.

DICE and F1 are basically the same formulat to calculate. In the manuscript we refer F1-Score exclusively for detection metrics (on instance level) and DICE for voxel-wise measure (also on instance level). The F1 is calculated on the TP/FP/FN instances, as determined by the groundtruth instance having >0.1 L-DICE with a prediction instance. So the entire instance is either GT instance is either a or TP/FN and the PD is either TP or FP.

I hope this clarifies some of your questions. If some remain just let me know and I can explain further.

In case you are intending to build your own pipeline, I am planning to publish an Evaluation tool (including very simple instance creation) that should allow this kind of evaluation out-of-the-box in June. So if you are under no time pressure it may be worth it to wait a few weeks and use the tooling as it comes out. If you want I can ping you once it's out.

Joeycho commented 2 months ago

Hi @TaWald,

I hope you're doing well.

https://github.com/Project-MONAI/MetricsReloaded

Is this 'MetricsReloaded' what you meant? Or another evaluation tool will come out? Yes, please ping me once it's out. Fortunately, I managed to meet BRATS organizers and I had a discussion with them about the challenges in lesion-wise evaluation (distance metrics in lesion-wise approach). I think at the moment, they will stick with https://github.com/rachitsaluja/BraTS-2023-Metrics. But once yours comes out, we might be able to discuss the details.

CCI-Bonn / HD-BM

To replicate lesion-wise evaluation #5