clamsproject / aapb-evaluations

Collection of evaluation codebases
Apache License 2.0
0 stars 1 forks source link

define "evaluator" interface #9

Open keighrim opened 1 year ago

keighrim commented 1 year ago

(subtask of #3)

We'd like to define a minimal, but concrete behavior of the class of "evaluator". Some features are also discussed in https://github.com/clamsproject/aapb-annotations/issues/2#issuecomment-1542748851. At the very minimum, an "evaluator" should be able

  1. take a batch of gold and a batch predictions and return a single HTML file with the evaluation result
  2. take batches of gold and batches of predictions and return a single HTML file with all the evaluation results and aggregated result.

Gold files are freely accessible from the https://github.com/clamsproject/aapb-annotations repository, but predictions files almost always need to be generated on demand, and in many cases (vision, audio apps) generating predictions will take hours, if not days, even with a small size batch. But running CLAMS pipelines, waiting for the generation for predictions (MMIF), and finally obtaining those MMIF files should not be responsibility of evaluators, but instead the evaluation "runner" or "invoker" should take charge of obtaining all golds and preds files before an evaluator runs.