OCR-D / zenhub

Repo for developing zenhub integration
Apache License 2.0
0 stars 0 forks source link

benchmarking spike 2: QA Spec for evaluation processors #123

Open krvoigt opened 2 years ago

krvoigt commented 2 years ago

Concept for benchmarking / data for the workflow tab

Data

In a discussion we identified the following properties of a workspace as crucial: publication date, font, layout, pages.

Metadata for data sets

Next steps for data

Ground Truths

:question: At this point I'm not sure if we simple use an existing GT or if we create ones ourselves.

Workflows

The main idea of the workflow tab is to enable OCR-D users to identify suitable workflows for their data (where suitability means CER/WER and/or performance of the workflow). Since we have a lot of processors, it's not feasible to perform a simple permutation of all processors for all data sets. A good starting point might be to use the findings and recommendations the KIT had in the second project phase combined with examples obtained from people using OCR-D on a daily basis (Maria?).

The first evaluation of the workflow results could be done with dinglehopper, which is suitable for simple text evaluation.

Next steps for workflows

Getting the data relevant for the front end

JSON Output

The dashboard should be fed with JSON containing all relevant information. A first draft of the data looks like this:

[
    {
        "workflow-id": "1",
        "ocrd-workspace": "https://some-url-pointing-to.a/mets.xml",
        "properties":
            {
                "font": "antiqua",
                "date-of-creation": "19. century",
                "no-of-pages": "100",
                "layout": "simple"
            },
        "workflow-metrics": "https://link-to-nextflow-results.com",
        "cer_total": "5.7",
        "cer_per_page": "0.92",
        "time_per_page_in_seconds": "15"        
    }
]

… and how to get it (maybe not in this spike)

In order to get a better understanding of how this is done, I will probably have to have a look at Nextflow and Mehmed's findings first.

Entscheidung für Metriken von Text und Layout treffen und kommunizieren Ziel: Messbarkeit (OCR besser oder schlechter)

AC for this sprint (30.08.2022)

mweidling commented 1 year ago

@kba @paulpestov

Could you please give me some feedback regarding the JSON output suggested above?

paulpestov commented 1 year ago

Thank you for your example on the workflow JSON. I have questions about the comibnation of different aspects that we discussed on the basis of the Workflow Tab Mockup. So there we have a matrix view on the workflows that were grouped by benchmark types (in our case now "metrics"?) and models. As far as I understand it here you would return an array of workflow objects like above, right? How would we achieve that matrix view then? By which attributes should group the workflows? How do the crucial properties that you describe above play together with the matrix view? Additional to that is it possible to intergrate the Nextflow results into that JSON in order to have a more standardized response?

mweidling commented 1 year ago

As far as I understand it here you would return an array of workflow objects like above, right? How would we achieve that matrix view then? By which attributes should group the workflows? How do the crucial properties that you describe above play together with the matrix view?

As far as I understood the draft users should be able to select different benchmark metrics according to their needs. Therefore I chose a rather flat nesting for the metrics that are relevant. My understanding was that the front end takes care of this sorting and displaying the workflows. Does that shift too much work to the web app?

Additional to that is it possible to intergrate the Nextflow results into that JSON in order to have a more standardized response?

I can add the relevant findings of Nextflow to the JSON output we produce instead of an URL if that is what you mean. If not, could you elaborate?

paulpestov commented 1 year ago

My understanding was that the front end takes care of this sorting and displaying the workflows.

Yea, so your array structure is fine. Also we have a list view of workflow where the array response is also useful.

... users should be able to select different benchmark metrics according to their needs.

But here I'm not quite sure. What we thought is to provide some filters, is it what you mean? My questions were rather about the actual properties of the workflow array items. Like how do we represent the different combinations of workflow A, model A, metric A, you know?

I can add the relevant findings of Nextflow to the JSON output we produce instead of an URL if that is what you mean

Yes, that would be good.

mweidling commented 1 year ago

But here I'm not quite sure. What we thought is to provide some filters, is it what you mean?

Yes.

My questions were rather about the actual properties of the workflow array items. Like how do we represent the different combinations of workflow A, model A, metric A, you know?

I think we have to add two properties, the model and the workflow_steps, like so:

[
    {
        "workflow-id": "1",
        "workflow_steps":
            {
                "0": "Processor A",
                "1": "Processor B"
            }
        "ocrd-workspace": "https://some-url-pointing-to.a/mets.xml",
        "properties":
            {
                "font": "antiqua",
                "date-of-creation": "19. century",
                "no-of-pages": "100",
                "layout": "simple"
            },
        "wall_time": 1234,
        "cer_total": "5.7",
        "cer_per_page": "0.92",
        "time_per_page_in_seconds": 15       
    }
]

Is this what you mean? Or do you want several JSON files for each representation?

paulpestov commented 1 year ago

Is this what you mean? Or do you want several JSON files for each representation?

It might be an idea to have individual arrays for different things, like:


{
   "models": [
      {
           "id": 1,
           "name": "Model A"
      }
   ],
   "works": [
      {
           "id": 1,
           "name": "Work A"
      }
   ],
   "workflows": [
      {
           "id": 1,
           // ...
      }
   ],
}

Everything would be referenced by id then. So we can easily create filters out of it. Maybe also bookmark a certain filter by appending it to the URL &work=1. What do you think?

The workflow steps is some additional topic I think but yea we also need this. For me they are not that critical to create the main functionalities of the front-end. They just add some additional info about the workflow.

More important I would consider (if our plans haven't changed) basic rendering of the list and matrix views. So properties would be our work (like work A, maybe rename properties to work?), here we could also use some title for the display. And yes we would need some model attribute with an object as value similar to properties. What info do we have from the model?

mweidling commented 1 year ago

After a discussion we came to the conclusion that we stick with the originally proposed format, but

mweidling commented 1 year ago
[
  {
    "workflow_id": "wf1-data345-eval1",
    "label": "Workflow 1 on Data 345",
    "metadata": {
      "workflow": "https://example.org/workflow/1",
      "workflow_steps": {
        "0": "Processor A",
        "1": "Processor B"
      },
      "workflow_model": "Fraktur_GT4HistOCR",
      "eval_workflow": "https://example.org/workflow/eval1",
      "eval_data": "https://example.org/workspace/345",
      "eval_tool": "dinglehopper",
      "gt_data": "https://gt.ocr-d.de/workspace/789",
      "document": {
        "fonts": ["antiqua", "fraktur"],
        "publication_year": "19. century",
        "number_of_pages": "100",
        "layout": "simple"
      }
    },
    "evaluations": {
      "document_wide": {
        "wall_time": 1234,
        "cer": 0.57,
        "cer_min_max": [0.2, 0.57]
      },
      "by_page": [
        {
          "page_id": "PHYS_0001",
          "cer": 0.8,
          "processing_time": 2.1
        }
      ]
    }
  },
  {
    "workflow_id": "wf2-data345-eval1",
    "label": "Workflow 2 on Data 345",
    "metadata": {
      "workflow": "https://example.org/workflow/2",
      "workflow_steps": {
        "0": "Processor A",
        "1": "Processor B"
      },
      "workflow_model": "Fraktur_GT4HistOCR",
      "eval_workflow": "https://example.org/workflow/eval1",
      "eval_data": "https://example.org/workspace/345",
      "eval_tool": "dinglehopper",
      "gt_data": "https://gt.ocr-d.de/workspace/789",
      "document": {
        "fonts": ["antiqua", "fraktur"],
        "publication_year": "19. century",
        "number_of_pages": "100",
        "layout": "simple"
      }
    },
    "evaluations": {
      "document_wide": {
        "wall_time": 1234,
        "cer": 0.88,
        "cer_min_max": [0.2, 0.57]
      },
      "by_page": [
        {
          "page_id": "PHYS_0001",
          "cer": 0.9,
          "processing_time": 2.0
        }
      ]
    }
  }
]

@kba @paulpestov I tried to integrate Konstantin's proposal and also added some key that might come in handy for the frond end app (e.g. workflow_model or eval_tool which could be used to display more information in the workflow tab).

EDIT: I added the second example of ocrd_eval.sample.yml.

kba commented 1 year ago

What is the difference between workflow_id in the top level and workflow in metadata?

I assume the former is the ID of the evaluation workflow and the later of the data generation workflow? Since both are to be Nextflow scripts, they should both be addressable as a URL via the Web API. Maybe we can rename them eval_workflow and ocr_workflow?

mweidling commented 1 year ago

What is the difference between workflow_id in the top level and workflow in metadata?

I assume the former is the ID of the evaluation workflow and the later of the data generation workflow? Since both are to be Nextflow scripts, they should both be addressable as a URL via the Web API. Maybe we can rename them eval_workflow and ocr_workflow?

I assumed that what you proposed in https://github.com/OCR-D/spec/compare/master...qa-spec#diff-3ca00602cf767fb4a01ea3267035a87437cc087ccf4aecb252942431e9e1411bR1 should be an ID of a discrete evaluation workflow and just adopted the rest. :D So yeah, maybe we should be a bit verbose with our keys.

mweidling commented 1 year ago

@kba What about this:

[
  {
    "eval_workflow_id": "wf1-data345-eval1",
    "label": "Workflow 1 on Data 345", // for UI display
    "metadata": {
      "data_creation_workflow": "https://example.org/workflow/1",
      "workflow_steps": {
        "0": "Processor A",
        "1": "Processor B"
      },
      "workflow_model": "Fraktur_GT4HistOCR", // for UI display
      "eval_workflow_url": "https://example.org/workflow/eval1",
      "eval_data": "https://example.org/workspace/345",
      "eval_tool": "dinglehopper",
      "gt_data": "https://gt.ocr-d.de/workspace/789",
      "data_properties": {
        "fonts": ["antiqua", "fraktur"],
        "publication_year": "19. century",
        "number_of_pages": "100",
        "layout": "simple"
      }
    },
    "evaluation_results": {
      "document_wide": {
        "wall_time": 1234,
        "cer": 0.57,
        "cer_min_max": [0.2, 0.57]
      },
      "by_page": [
        {
          "page_id": "PHYS_0001",
          "cer": 0.8,
          "processing_time": 2.1
        }
      ]
    }
  },
  {
    "eval_workflow_id": "wf2-data345-eval1",
    "label": "Workflow 2 on Data 345",
    "metadata": {
      "data_creation_workflow": "https://example.org/workflow/2",
      "workflow_steps": {
        "0": "Processor A",
        "1": "Processor B"
      },
      "workflow_model": "Fraktur_GT4HistOCR",
      "eval_workflow_url": "https://example.org/workflow/eval1",
      "eval_data": "https://example.org/workspace/345",
      "eval_tool": "dinglehopper",
      "gt_data": "https://gt.ocr-d.de/workspace/789",
      "data_properties": {
        "fonts": ["antiqua", "fraktur"],
        "publication_year": "19. century",
        "number_of_pages": "100",
        "layout": "simple"
      }
    },
    "evaluation_results": {
      "document_wide": {
        "wall_time": 1234,
        "cer": 0.88,
        "cer_min_max": [0.2, 0.57]
      },
      "by_page": [
        {
          "page_id": "PHYS_0001",
          "cer": 0.9,
          "processing_time": 2.0
        }
      ]
    }
  }
]

Changes: