dhlab-epfl / dhSegment

Generic framework for historical document processing
https://dhlab-epfl.github.com/dhSegment
GNU General Public License v3.0
370 stars 116 forks source link

PredictionType.CLASSIFICATION and extracting rectangles #60

Closed tralfamadude closed 3 years ago

tralfamadude commented 3 years ago

I am attempting CLASSIFICATION now, not MULTILABEL (issue https://github.com/dhlab-epfl/dhSegment/issues/29 was helpful in mentioning that mutually-exclusive areas mean classification, not multilabel. This is clear in retrospect ;^)

Now I need to extract rectangles and I have hit a big gap in dhSegment. The demo.py code shows how to generate the rectangle corresponding to a skewed page, but there is only one class. I modified demo.py to identify rectangles for each label. When there are multiple classes, there can be spurious, overlapping rectangles.

How can I:

  1. Identify the highest confidence class instances
  2. That are not overlapping

The end result I want is one or more jpegs associated with a particular class label plus the coordinates within the input image.

Perhaps the labels plane in the prediction result offers some help here? demo.py does not use the labels plane.

SeguinBe commented 3 years ago

The core of dhSegment is to provide the way to perform the pixel-wise class predictions.

What can be done with these predictions is almost always task dependent. While we provide some usual functions to perform that step (https://dhsegment.readthedocs.io/en/latest/reference/post_processing.html), you most likely will have to adapt based on what you want.

For instance generating non-overlapping rectangles based on the prediction map is a problem that does not have a clear solution.

As far as confidence is concerned, each pixel has a probability value, so you can leverage that.

tralfamadude commented 3 years ago

Thanks, Benoit.

I improved the classification by working with simpler labeled regions. Previously I labeled "beginning of article" but this turns out to be complex and noisy because I also had simpler labeled areas that overlapped visually. Using only simple and visually distinct labeled annotations like {title, author} and post-processing that requires both to be present on one page, I have something that could work. If the content is all in the same font/font-size, this might not work, but I don't see that (at least not yet).

SeguinBe commented 3 years ago

Glad to hear you manage to improve your pipeline.