SamEdwardes / spacypdfreader

Easy PDF to text to spaCy text extraction in Python.
https://samedwardes.github.io/spacypdfreader/
MIT License
33 stars 1 forks source link

is there any way to remove header and footer from pdf? #13

Closed hemilparmar closed 1 year ago

hemilparmar commented 1 year ago

is there any way to remove header and footer from pdf? we can extract only natural text excluding header and footer

SamEdwardes commented 1 year ago

Sorry, there is no way within spacypdfreader. You could use other tools to remove the PDF before passing into spacypdfreader.