dhlab-epfl / dhSegment

Generic framework for historical document processing
https://dhlab-epfl.github.com/dhSegment
GNU General Public License v3.0
370 stars 116 forks source link

Text-line Detection #16

Closed ghost closed 5 years ago

ghost commented 5 years ago

@SeguinBe @solivr Thank you for your hard work.

Regarding training a text-line detector from scratch,

  1. Since I'm only interested in text-lines; Can I only color the text-lines as boxes in Red and make the classes.txt file as:

    0 0 0
    255 0 0
  2. What is the suitable config.json to be used for training a text-line detection from scratch, since demo_config.json requires a pretrained_model named resnet50, which is not for line detection.

    python train.py with demo/demo_config.json
  3. @solivr you suggested in issue 9 to use the cBAD dataset for baseline detection, I have downloaded this dataset, and noticed that the annotations are in page xml, so how can I use this dataset with it's page xml in training?

22-compressed-86

  1. You have mentioned in your github main-page that you have used the annotator1 from pagenet, how did you use it, if it's in x1, y1, x2, y2, x3, y3, x4, y4 format?

  2. You also have mentioned that the training images have been downsized to have 1M pixels each, does downsizing the training images reduce the recognition quality?

SeguinBe commented 5 years ago

Most of your questions have their answer in the accompanying publication. We always convert what we want to detect as an image of labels, which is a task dependent process. Some code for this can be found in relevant folders in the exps directory.

Pretrained models are widely seen as a good initialization, even if they were trained for something different.

The image are downscaled because the network only sees a local context of a fixed number of pixels. Scans are often very high resolution which makes the local context uninformative. In short, downsizing is important to attain good performance, and it is done on the fly by dh_segment.

solivr commented 5 years ago

Some additional details to your questions :

  1. Yes you can.
  2. The pretrained_model is used for the encoder part of the network (as @SeguinBe suggested, have a look at the paper), so you'll need it for any task if you want to be efficient. Regarding the config.json you can copy the demo_config.json and update the training_params like this :
    "batch_size": 16,
    "input_resized_size": 1000000,
    "make_patches": true,
    "patch_shape": [
        300,
        300
    ],
    "training_margin": 16
  3. For each image, you need to generate the corresponding label image. You can have a look at an example here : https://github.com/dhlab-epfl/dhSegment/blob/master/exps/cbad/utils.py
  4. Same as point 3 you can look at https://github.com/dhlab-epfl/dhSegment/blob/master/exps/page/utils.py
ghost commented 5 years ago

Thank you both, your amazing!!