Open samayer12 opened 4 years ago
Description
With Complex_1.pdf as the source document, numbered paragraph 44 is improperly processed as 44.
44.
Paragraph 44's text is found after 45.
45.
To Reproduce
textract.process('Complex_1.pdf', method='pdfminer').decode()
Expected Output
[...] 44. PFAS are a class of chemicals encompassing more than 5,000 unique substances. 45. Scientific research demonstrates that members of the class of PFAS can have [...]
Desktop:
5.6.16-1-MANJARO
1.6.3
3.8.3
No
Additional Info
Possibly better-suited for this fork of pdfminer?
pdfminer
Description
With Complex_1.pdf as the source document, numbered paragraph 44 is improperly processed as
44.
Paragraph 44's text is found after
45.
To Reproduce
textract.process('Complex_1.pdf', method='pdfminer').decode()
44.
(My output.)Expected Output
Desktop:
5.6.16-1-MANJARO
1.6.3
3.8.3
No
Additional Info
Possibly better-suited for this fork of
pdfminer
?