MellowKyler / pdfp

PDF Processor - a GUI for some common PDF operations.
https://pypi.org/project/pdfp
GNU Affero General Public License v3.0
2 stars 0 forks source link

Exclude Footnotes From Text Extraction #4

Open MellowKyler opened 3 months ago

MellowKyler commented 3 months ago

No idea how to implement. I believe there are some tools that can divide a pdf page into boxes of text, and I may be able to exclude based on that? Potentially font size is an option? ML tools possibly.