Closed Borda closed 5 years ago
There is documentation on how the automated evaluation works here: https://grand-challengeorg.readthedocs.io/en/latest/evaluation.html#evaluation-container-requirements
We have a python library that implements this API here: https://github.com/comic/evalutils Docs here: https://evalutils.readthedocs.io/en/latest/?badge=latest
It's pip installable on Python 3.6+. There is a "getting started" tutorial here: https://evalutils.readthedocs.io/en/latest/usage.html#getting-started
We only have primitives for Classification, Detection and Segmentation tasks there, but I think this is a good place to start. For your task, I'm not sure if it's best to start out with a classification or detection task - classification assumes that each row is a case, whereas detection assumes that there is a different number of rows for each case so I would probably say that detection is the best place to start. I'd definitely like to add registration support to evalutils so this would be a good test case.
Yes, as long as you output the individual case results to metrics.json then they're stored in the database, and we can then share them with you if you want to do a further analysis.
An example of this is on the results page for promise12: https://promise12.grand-challenge.org/evaluation/results/41a2b7aa-b36a-4434-afa8-2c2ef8c5fa2b/
This is an individual result, as you can see the organisers keep "apex_dice" (and other metrics) for each of the 30 cases. You probably want to store the case id too.
If you use the "segmentation" option in evalutils, then this is all set up for you.
prepare evaluation script for ANHIR according to https://anhir.grand-challenge.org/Evaluation