krvoigt commented 2 years ago

Concept for benchmarking / data for the workflow tab

Data

In a discussion we identified the following properties of a workspace as crucial: publication date, font, layout, pages.

Publication date: The idea is to provide data sets from all VD periods as well as modern texts to cover most of our known use cases
fonts: Antiqua and black letter are the most common for the VD periods. It would be beneficial to have some Greek and maybe even Hebrew examples, though, as Greek and Hebrew are "holy languages" and might therefore be quoted in some texts.
layout: Layout consideration should encompass title pages (that have a lot of decoration, capitals, etc.), multi columned pages, tables, binding pages, vacant pages, maps and sheet music
pages: the number of pages should range from 1 (leaflet) to 150 or even 300 page(s) (full monograph) to get an impression about a workflow's performance

Metadata for data sets

In the JSON representation of a workflow the data should be tagged to enable easy sorting/filtering
each data set needs a thorough description so that users can compare their own data to the sample data sets used for a workflow

Next steps for data

[ ] create data sets we want to use for workflows (re-using the ones I got from @kba)
[ ] create tags for them
[ ] (write a first description) (this step is important for the application's usability, but not pressing at this point)

Ground Truths

:question: At this point I'm not sure if we simple use an existing GT or if we create ones ourselves.

Workflows

The main idea of the workflow tab is to enable OCR-D users to identify suitable workflows for their data (where suitability means CER/WER and/or performance of the workflow). Since we have a lot of processors, it's not feasible to perform a simple permutation of all processors for all data sets. A good starting point might be to use the findings and recommendations the KIT had in the second project phase combined with examples obtained from people using OCR-D on a daily basis (Maria?).

The first evaluation of the workflow results could be done with dinglehopper, which is suitable for simple text evaluation.

Next steps for workflows

[ ] ~re-do the evaluation done by the KIT with newer processor versions and check if CER/WER and/or performance changed~ (this doesn't seem feasible)
[ ] also consider newer processors in the evaluation
[ ] get in contact with Maria to talk about the workflows used on a day to day basis

Getting the data relevant for the front end

JSON Output

The dashboard should be fed with JSON containing all relevant information. A first draft of the data looks like this:

[
    {
        "workflow-id": "1",
        "ocrd-workspace": "https://some-url-pointing-to.a/mets.xml",
        "properties":
            {
                "font": "antiqua",
                "date-of-creation": "19. century",
                "no-of-pages": "100",
                "layout": "simple"
            },
        "workflow-metrics": "https://link-to-nextflow-results.com",
        "cer_total": "5.7",
        "cer_per_page": "0.92",
        "time_per_page_in_seconds": "15"        
    }
]

… and how to get it (maybe not in this spike)

In order to get a better understanding of how this is done, I will probably have to have a look at Nextflow and Mehmed's findings first.

Entscheidung für Metriken von Text und Layout treffen und kommunizieren Ziel: Messbarkeit (OCR besser oder schlechter)

[x] agree on metrics for text and layout
[x] agree on the calculations that should be used
[ ] communicate the metrics and the calculations

AC for this sprint (30.08.2022)

[ ] we have a concept for doing the actual benchmark and well defined formats to express results
[ ] concept is accepted by community

mweidling commented 1 year ago

@kba @paulpestov

Could you please give me some feedback regarding the JSON output suggested above?

paulpestov commented 1 year ago

Thank you for your example on the workflow JSON. I have questions about the comibnation of different aspects that we discussed on the basis of the Workflow Tab Mockup. So there we have a matrix view on the workflows that were grouped by benchmark types (in our case now "metrics"?) and models. As far as I understand it here you would return an array of workflow objects like above, right? How would we achieve that matrix view then? By which attributes should group the workflows? How do the crucial properties that you describe above play together with the matrix view? Additional to that is it possible to intergrate the Nextflow results into that JSON in order to have a more standardized response?

mweidling commented 1 year ago

As far as I understand it here you would return an array of workflow objects like above, right? How would we achieve that matrix view then? By which attributes should group the workflows? How do the crucial properties that you describe above play together with the matrix view?

As far as I understood the draft users should be able to select different benchmark metrics according to their needs. Therefore I chose a rather flat nesting for the metrics that are relevant. My understanding was that the front end takes care of this sorting and displaying the workflows. Does that shift too much work to the web app?

Additional to that is it possible to intergrate the Nextflow results into that JSON in order to have a more standardized response?

I can add the relevant findings of Nextflow to the JSON output we produce instead of an URL if that is what you mean. If not, could you elaborate?

paulpestov commented 1 year ago

My understanding was that the front end takes care of this sorting and displaying the workflows.

Yea, so your array structure is fine. Also we have a list view of workflow where the array response is also useful.

... users should be able to select different benchmark metrics according to their needs.

But here I'm not quite sure. What we thought is to provide some filters, is it what you mean? My questions were rather about the actual properties of the workflow array items. Like how do we represent the different combinations of workflow A, model A, metric A, you know?

I can add the relevant findings of Nextflow to the JSON output we produce instead of an URL if that is what you mean

Yes, that would be good.

mweidling commented 1 year ago

But here I'm not quite sure. What we thought is to provide some filters, is it what you mean?

Yes.

My questions were rather about the actual properties of the workflow array items. Like how do we represent the different combinations of workflow A, model A, metric A, you know?

I think we have to add two properties, the model and the workflow_steps, like so:

[
    {
        "workflow-id": "1",
        "workflow_steps":
            {
                "0": "Processor A",
                "1": "Processor B"
            }
        "ocrd-workspace": "https://some-url-pointing-to.a/mets.xml",
        "properties":
            {
                "font": "antiqua",
                "date-of-creation": "19. century",
                "no-of-pages": "100",
                "layout": "simple"
            },
        "wall_time": 1234,
        "cer_total": "5.7",
        "cer_per_page": "0.92",
        "time_per_page_in_seconds": 15       
    }
]

Is this what you mean? Or do you want several JSON files for each representation?

paulpestov commented 1 year ago

Is this what you mean? Or do you want several JSON files for each representation?

It might be an idea to have individual arrays for different things, like:


{
   "models": [
      {
           "id": 1,
           "name": "Model A"
      }
   ],
   "works": [
      {
           "id": 1,
           "name": "Work A"
      }
   ],
   "workflows": [
      {
           "id": 1,
           // ...
      }
   ],
}

Everything would be referenced by id then. So we can easily create filters out of it. Maybe also bookmark a certain filter by appending it to the URL &work=1. What do you think?

The workflow steps is some additional topic I think but yea we also need this. For me they are not that critical to create the main functionalities of the front-end. They just add some additional info about the workflow.

More important I would consider (if our plans haven't changed) basic rendering of the list and matrix views. So properties would be our work (like work A, maybe rename properties to work?), here we could also use some title for the display. And yes we would need some model attribute with an object as value similar to properties. What info do we have from the model?

mweidling commented 1 year ago

After a discussion we came to the conclusion that we stick with the originally proposed format, but

add information about the model
integrate @kba's proposed format (see https://github.com/OCR-D/spec/compare/master...qa-spec)

mweidling commented 1 year ago

[
  {
    "workflow_id": "wf1-data345-eval1",
    "label": "Workflow 1 on Data 345",
    "metadata": {
      "workflow": "https://example.org/workflow/1",
      "workflow_steps": {
        "0": "Processor A",
        "1": "Processor B"
      },
      "workflow_model": "Fraktur_GT4HistOCR",
      "eval_workflow": "https://example.org/workflow/eval1",
      "eval_data": "https://example.org/workspace/345",
      "eval_tool": "dinglehopper",
      "gt_data": "https://gt.ocr-d.de/workspace/789",
      "document": {
        "fonts": ["antiqua", "fraktur"],
        "publication_year": "19. century",
        "number_of_pages": "100",
        "layout": "simple"
      }
    },
    "evaluations": {
      "document_wide": {
        "wall_time": 1234,
        "cer": 0.57,
        "cer_min_max": [0.2, 0.57]
      },
      "by_page": [
        {
          "page_id": "PHYS_0001",
          "cer": 0.8,
          "processing_time": 2.1
        }
      ]
    }
  },
  {
    "workflow_id": "wf2-data345-eval1",
    "label": "Workflow 2 on Data 345",
    "metadata": {
      "workflow": "https://example.org/workflow/2",
      "workflow_steps": {
        "0": "Processor A",
        "1": "Processor B"
      },
      "workflow_model": "Fraktur_GT4HistOCR",
      "eval_workflow": "https://example.org/workflow/eval1",
      "eval_data": "https://example.org/workspace/345",
      "eval_tool": "dinglehopper",
      "gt_data": "https://gt.ocr-d.de/workspace/789",
      "document": {
        "fonts": ["antiqua", "fraktur"],
        "publication_year": "19. century",
        "number_of_pages": "100",
        "layout": "simple"
      }
    },
    "evaluations": {
      "document_wide": {
        "wall_time": 1234,
        "cer": 0.88,
        "cer_min_max": [0.2, 0.57]
      },
      "by_page": [
        {
          "page_id": "PHYS_0001",
          "cer": 0.9,
          "processing_time": 2.0
        }
      ]
    }
  }
]

@kba @paulpestov I tried to integrate Konstantin's proposal and also added some key that might come in handy for the frond end app (e.g. workflow_model or eval_tool which could be used to display more information in the workflow tab).

EDIT: I added the second example of ocrd_eval.sample.yml.

kba commented 1 year ago

What is the difference between workflow_id in the top level and workflow in metadata?

I assume the former is the ID of the evaluation workflow and the later of the data generation workflow? Since both are to be Nextflow scripts, they should both be addressable as a URL via the Web API. Maybe we can rename them eval_workflow and ocr_workflow?

mweidling commented 1 year ago

What is the difference between workflow_id in the top level and workflow in metadata?

I assume the former is the ID of the evaluation workflow and the later of the data generation workflow? Since both are to be Nextflow scripts, they should both be addressable as a URL via the Web API. Maybe we can rename them eval_workflow and ocr_workflow?

I assumed that what you proposed in https://github.com/OCR-D/spec/compare/master...qa-spec#diff-3ca00602cf767fb4a01ea3267035a87437cc087ccf4aecb252942431e9e1411bR1 should be an ID of a discrete evaluation workflow and just adopted the rest. :D So yeah, maybe we should be a bit verbose with our keys.

mweidling commented 1 year ago

@kba What about this:

[
  {
    "eval_workflow_id": "wf1-data345-eval1",
    "label": "Workflow 1 on Data 345", // for UI display
    "metadata": {
      "data_creation_workflow": "https://example.org/workflow/1",
      "workflow_steps": {
        "0": "Processor A",
        "1": "Processor B"
      },
      "workflow_model": "Fraktur_GT4HistOCR", // for UI display
      "eval_workflow_url": "https://example.org/workflow/eval1",
      "eval_data": "https://example.org/workspace/345",
      "eval_tool": "dinglehopper",
      "gt_data": "https://gt.ocr-d.de/workspace/789",
      "data_properties": {
        "fonts": ["antiqua", "fraktur"],
        "publication_year": "19. century",
        "number_of_pages": "100",
        "layout": "simple"
      }
    },
    "evaluation_results": {
      "document_wide": {
        "wall_time": 1234,
        "cer": 0.57,
        "cer_min_max": [0.2, 0.57]
      },
      "by_page": [
        {
          "page_id": "PHYS_0001",
          "cer": 0.8,
          "processing_time": 2.1
        }
      ]
    }
  },
  {
    "eval_workflow_id": "wf2-data345-eval1",
    "label": "Workflow 2 on Data 345",
    "metadata": {
      "data_creation_workflow": "https://example.org/workflow/2",
      "workflow_steps": {
        "0": "Processor A",
        "1": "Processor B"
      },
      "workflow_model": "Fraktur_GT4HistOCR",
      "eval_workflow_url": "https://example.org/workflow/eval1",
      "eval_data": "https://example.org/workspace/345",
      "eval_tool": "dinglehopper",
      "gt_data": "https://gt.ocr-d.de/workspace/789",
      "data_properties": {
        "fonts": ["antiqua", "fraktur"],
        "publication_year": "19. century",
        "number_of_pages": "100",
        "layout": "simple"
      }
    },
    "evaluation_results": {
      "document_wide": {
        "wall_time": 1234,
        "cer": 0.88,
        "cer_min_max": [0.2, 0.57]
      },
      "by_page": [
        {
          "page_id": "PHYS_0001",
          "cer": 0.9,
          "processing_time": 2.0
        }
      ]
    }
  }
]

Changes:

workflow_id --> eval_workflow_id
workflow --> data_creation_workflow
document --> data_properties
evaluations --> evaluation_results

OCR-D / zenhub

benchmarking spike 2: QA Spec for evaluation processors #123

Concept for benchmarking / data for the workflow tab

Data

Metadata for data sets

Next steps for data

Ground Truths

Workflows

Next steps for workflows

Getting the data relevant for the front end

JSON Output

… and how to get it (maybe not in this spike)

AC for this sprint (30.08.2022)