I'm working on a set of PDFs that have different page layouts i.e. multi-column/single column+images+figures+tables. It's giving the below average results though I have tried different provided models for it.
Can anyone suggest me how to achieve better results with such a diverse document pages?
Also,the most important, I need the reading order in the return Layout variable of results. How can I get this? Example given below:
Layout(_blocks=[TextBlock(block=Rectangle(x_1=201.03945922851562, y_1=413.36480712890625, x_2=1500.326904296875, y_2=1290.304931640625), text=None, id=None, type=Figure, parent=None, next=None, score=0.9502648115158081), TextBlock(block=Rectangle(x_1=174.8553466796875, y_1=270.81329345703125, x_2=1229.29443359375, y_2=416.44305419921875), text=None, id=None, type=Title, parent=None, next=None, score=0.9470152258872986), TextBlock(block=Rectangle(x_1=200.6973419189453, y_1=489.5100402832031, x_2=560.0352172851562, y_2=518.6799926757812), text=None, id=None, type=Text, parent=None, next=None, score=0.8652349710464478), TextBlock(block=Rectangle(x_1=260.79986572265625, y_1=1346.680419921875, x_2=1495.7305908203125, y_2=1452.4434814453125), text=None, id=None, type=Text, parent=None, next=None, score=0.8538650870323181)], page_data={})
Let me know if anyone has any suggestion/solution to improve it. Thanks a lot.
I'm working on a set of PDFs that have different page layouts i.e. multi-column/single column+images+figures+tables. It's giving the below average results though I have tried different provided models for it.
Layout(_blocks=[TextBlock(block=Rectangle(x_1=201.03945922851562, y_1=413.36480712890625, x_2=1500.326904296875, y_2=1290.304931640625), text=None, id=None, type=Figure, parent=None, next=None, score=0.9502648115158081), TextBlock(block=Rectangle(x_1=174.8553466796875, y_1=270.81329345703125, x_2=1229.29443359375, y_2=416.44305419921875), text=None, id=None, type=Title, parent=None, next=None, score=0.9470152258872986), TextBlock(block=Rectangle(x_1=200.6973419189453, y_1=489.5100402832031, x_2=560.0352172851562, y_2=518.6799926757812), text=None, id=None, type=Text, parent=None, next=None, score=0.8652349710464478), TextBlock(block=Rectangle(x_1=260.79986572265625, y_1=1346.680419921875, x_2=1495.7305908203125, y_2=1452.4434814453125), text=None, id=None, type=Text, parent=None, next=None, score=0.8538650870323181)], page_data={})
Let me know if anyone has any suggestion/solution to improve it. Thanks a lot.Environment
Screenshots