comic / grand-challenge.org

A platform for end-to-end development of machine learning solutions in biomedical imaging
https://grand-challenge.org
Apache License 2.0
176 stars 51 forks source link

Serialize runtime metrics for jobs #3035

Closed chrisvanrun closed 1 year ago

chrisvanrun commented 1 year ago

In the context of transparency and sustainability, we should serialise the runtime metrics for algorithm jobs.

There is currently a runtime_metrics member to the ComponentJob model, depending on how it is populated we can expose it directly or parse its input in the serialisation of Jobs.

jmsmkn commented 1 year ago

This is a ton of data to serialize which would slow down the list view especially, why is it necessary over the API? It is already accessible from the algorithm detail page.

chrisvanrun commented 1 year ago

Direct lead is because challenge organizers (ULS23) are interested in the runtimes of the submitted algorithms. They don't want to go so far as to use it as evaluation metrics but for their context it is relevant: segmentation following a mouse click.

I don't think organizers have direct access to the algorithms themselves without requesting access from the participants. The runtime metrics I initially spot-checked on one of the Jobs looked minimal. I initially thought they were aggregates (time n=1) but looking at a few more jobs I noticed they actually represent time series.

The API would allow them to aggregate it themselves. However, MVP would be runtime. Is it safe to say that Job.started_at and Job.completed_at, correctly correspond to the actual runtime of the algorithm itself (model loading + inference)?

jmsmkn commented 1 year ago

started_at and completed_at are already there in both the API and predictions.json and are enough to answer "how long did this algorithm take to run".