HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.16k stars 2.38k forks source link

Extend OCR template to better support token classification #3447

Open mbuet2ner opened 1 year ago

mbuet2ner commented 1 year ago

Is your feature request related to a problem? Please describe. TLDR: Lock bounding boxes to better support token classification with a new editable option. Only a very small feature in code that would result in a substantial improvement for the process!

Token classification (for documents) has gained considerable traction over the last years. Recent models such as the LayoutLM family by Microsoft push the boundaries of what is possible. For Label Studio the workflow is as follows:

  1. A PDF document is converted to images and for each word (token) the respective bounding box and text information is send to Label Studio.
  2. The user assigns some of the bounding boxes (from the text layer of the PDF/ OCR engine) to a class. For receipt understanding, e.g. "total sum".

Problem: Using the custom template based on the provided OCR template the user can assign each bounding box to a class. But he/she can also edit the shape of the bounding box itself. For the task of token classification, this is unwanted behavior. Only the class needs to be changed.

Describe the solution you'd like Similar to the text area it would be great to have a editable option for the polygon/ bounding box to lock it.

Describe alternatives you've considered An alternative would be to tell the labeling personnel not to move the bounding boxes or to realign them via post-processing. Both solutions are not feasible because firstly, mistakes such as accidentally moving the polygons will happen eventually (and might cause confusion) and second, re-creating the original polygon positions is hard to achieve especially for edge cases.

makseq commented 1 year ago

Thank you for your idea!

Something that you can check now - read-only regions: https://labelstud.io/guide/predictions.html#Read-only-and-hidden-regions read-only doesn't allow to modify bboxes at all, but maybe it will work for you too.