OCR-D / zenhub

Repo for developing zenhub integration
Apache License 2.0
0 stars 0 forks source link

Add diachronic information to QuiVer #148

Open mweidling opened 1 year ago

mweidling commented 1 year ago

Describe the feature you'd like

Currently we display only the latest information about a workflow in QuiVer. We run a workflow A, the important metrics are saved and overwritten when we run workflow A again.

In order to measure how the changes in the OCR-D software impact the OCR quality as well as the hardware statistics we should introduce diachronic information to QuiVer, e.g. via a time stamp.

User story

As a developer I need an overview of how the changes in the software effect the OCR quality and hardware metrics in order to be certain that the newest contribution to OCR-D really improve the software's outcome.

Ideas we have discussed so far

How to display the information

For each GT corpus available there should be a line chart that depicts how a metric has changed over time. Each step in time (x axis) represents an ocrd_all or a ocrd_core release (clarified -> ocrd_all; see comments) Users can choose between the different metrics and can see a tendency whether the metric improves or not.

Underlying data structure

When selecting a GT corpus the front end uses an ID map file that points it to the right collection of JSON objects. Each OCR-D workflow that is executed on a GT corpus has a separate file in which all the runs per release are present.

Given GT workspace 16_ant_simple. We then have a file 16_ant_simple_minimal.json with all its benchmarking workflows, 16_ant_simple_selected_pages.json with all its benchmarking workflows etc. Each executed workflow has a timestamp by which the front end can then sort the single executions and retrieve the relevant data.

TODOs

paulpestov commented 1 year ago

Here is the first draft according to this description: Workflow Runs List@2x (1)

mweidling commented 1 year ago

clarify what our steps / increments in time are. A release of ocrd_all? A release of ocrd_core?

According to @kba this doesn't matter much so I will opt for ocrd_all.

cneud commented 1 year ago

+1 for basing this of ocrd_all.