Definition of the benchmarking metrics

OCR-D / zenhub

Repo for developing zenhub integration

Apache License 2.0

0 stars 0 forks source link

Definition of the benchmarking metrics #125

Open mweidling opened 2 years ago

mweidling commented 2 years ago

We have identified the following metrics to be relevant for benchmarking:

Bag of Words
CER/WER
Flexible CER
Reading Order
IoU
mAP
CPU time
wall time
I/O
memory usage

In order for us and our users to be clear what we exactly mean when we use these terms we have to properly define them and add them to the OCR-D specs.

Prior Art: https://pad.gwdg.de/3S_yuzyERum4WQChxV6UyQ Link to draft: https://pad.gwdg.de/rLDBVhmYQ8CwOd67KcYHwQ#

[x] define each metric
[ ] add them to the specs

mweidling commented 2 years ago

See https://pad.gwdg.de/rLDBVhmYQ8CwOd67KcYHwQ# for the current status.

mweidling commented 2 years ago

@kba @cneud

My first draft of the metrics is ready. Could you please have a look at them? There are still some open TODOs which indicate points that we should talk about / need to define.

I intentionally left the "scenario based layout evaluation" empty because from what I got from the paper linked this is not a metric in the narrower sense. Maybe we could talk about this as well.

cneud commented 2 years ago

Thank you @mweidling! I've added a new top level and slightly restructured to make the distinction between the evaluation of text, layout and resource utilization more clear, and added some introductory remarks for those sections. Looks very good otherwise, I guess we can have one more call and then publish a first version to spec.

mweidling commented 2 years ago

Thank you for your feedback and work, @cneud ! I'll schedule a call then.