Borda / BIRL

BIRL: Benchmark on Image Registration methods with Landmark validations
http://borda.github.io/BIRL
BSD 3-Clause "New" or "Revised" License
91 stars 26 forks source link

GC submission #33

Closed Borda closed 5 years ago

Borda commented 5 years ago

prepare evaluation script for ANHIR according to https://anhir.grand-challenge.org/Evaluation

Borda commented 5 years ago

There is documentation on how the automated evaluation works here: https://grand-challengeorg.readthedocs.io/en/latest/evaluation.html#evaluation-container-requirements

We have a python library that implements this API here: https://github.com/comic/evalutils Docs here: https://evalutils.readthedocs.io/en/latest/?badge=latest

It's pip installable on Python 3.6+. There is a "getting started" tutorial here: https://evalutils.readthedocs.io/en/latest/usage.html#getting-started

We only have primitives for Classification, Detection and Segmentation tasks there, but I think this is a good place to start. For your task, I'm not sure if it's best to start out with a classification or detection task - classification assumes that each row is a case, whereas detection assumes that there is a different number of rows for each case so I would probably say that detection is the best place to start. I'd definitely like to add registration support to evalutils so this would be a good test case.

Borda commented 5 years ago

Yes, as long as you output the individual case results to metrics.json then they're stored in the database, and we can then share them with you if you want to do a further analysis.

An example of this is on the results page for promise12: https://promise12.grand-challenge.org/evaluation/results/41a2b7aa-b36a-4434-afa8-2c2ef8c5fa2b/

This is an individual result, as you can see the organisers keep "apex_dice" (and other metrics) for each of the 30 cases. You probably want to store the case id too.

If you use the "segmentation" option in evalutils, then this is all set up for you.