instead of one text block multiple text blocks

I am using layoutparser '0.3.4' through

! pip install layoutparser torchvision && pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.5#egg=detectron2" in colab

my model is

model = lp.models.Detectron2LayoutModel('lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
                                 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.5],
                                 label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})

What I see is that block detection is too sensitive - meaning instead returning one block of text, the result is instead four blocks of text. The input is an article from pubmed.

What is the best practice in such case ? 1) labling additional data and fine tuning the model ? 2) post analysis using the coordinates ? (too hacky) 3) is there any other model that is less sensitive

Layout-Parser / layout-parser

instead of one text block multiple text blocks #173