What I see is that block detection is too sensitive - meaning instead returning one block of text, the result is instead four blocks of text. The input is an article from pubmed.
What is the best practice in such case ?
1) labling additional data and fine tuning the model ?
2) post analysis using the coordinates ? (too hacky)
3) is there any other model that is less sensitive
I am using layoutparser '0.3.4' through
! pip install layoutparser torchvision && pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.5#egg=detectron2"
in colabmy model is
What I see is that block detection is too sensitive - meaning instead returning one block of text, the result is instead four blocks of text. The input is an article from pubmed.
What is the best practice in such case ? 1) labling additional data and fine tuning the model ? 2) post analysis using the coordinates ? (too hacky) 3) is there any other model that is less sensitive