SamEdwardes / spacypdfreader

Easy PDF to text to spaCy text extraction in Python.
https://samedwardes.github.io/spacypdfreader/
MIT License
33 stars 1 forks source link

Add support for page_range in pdf_reader #20

Closed SamEdwardes closed 11 months ago

SamEdwardes commented 11 months ago

For example:

import spacy
from spacypdfreader import pdf_reader
from spacypdfreader.parsers import pytesseract

nlp = spacy.load("en_core_web_sm")
doc = pdf_reader("tests/data/test_pdf_01.pdf", nlp, pytesseract.parser, n_processes=4, page_range=(2, 3))

Closes #18 and closes #16.