Closed chrisvanrun closed 1 year ago
This is a ton of data to serialize which would slow down the list view especially, why is it necessary over the API? It is already accessible from the algorithm detail page.
Direct lead is because challenge organizers (ULS23) are interested in the runtimes of the submitted algorithms. They don't want to go so far as to use it as evaluation metrics but for their context it is relevant: segmentation following a mouse click.
I don't think organizers have direct access to the algorithms themselves without requesting access from the participants. The runtime metrics I initially spot-checked on one of the Job
s looked minimal. I initially thought they were aggregates (time n=1) but looking at a few more jobs I noticed they actually represent time series.
The API would allow them to aggregate it themselves. However, MVP would be runtime. Is it safe to say that Job.started_at
and Job.completed_at
, correctly correspond to the actual runtime of the algorithm itself (model loading + inference)?
started_at
and completed_at
are already there in both the API and predictions.json and are enough to answer "how long did this algorithm take to run".
In the context of transparency and sustainability, we should serialise the runtime metrics for algorithm jobs.
There is currently a runtime_metrics member to the
ComponentJob
model, depending on how it is populated we can expose it directly or parse its input in the serialisation of Jobs.