atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.65k stars 355 forks source link

Camelot parsed table_areas is not strictly parsed by my specified table areas and may slight beyond my table areas #455

Open boy-be-ambitious opened 3 years ago

boy-be-ambitious commented 3 years ago

Hi all, when using camelot to parse pdf table with the specified table areas, it may extend slightly out of my table areas. I want to avoid it. Here is my pdf, please see page 3. 14-较低活化凝血时间下心房颤动消融术并发症的发生率及出血并发症的危险因素.pdf. You can see the title which is colored red by me and I don't want the title to be parsed in the result. image

Here are my expected table areas ['309.12, 756.2, 538.56, 606.36'] and visualize the corresponding region with tools. image But camelot still parse the unneeded title which is beyond my areas actually. How can I strictly parse pdf with specified table areas? Thx!

image