Closed christinestraub closed 4 months ago
Closes #2896.
This PR aims to fix partition_pdf() to keep spaces in text. The control character \t is now replaced with a space instead of being removed when merging inferred and embedded elements.
partition_pdf()
\t
PDF: rok_20230930_1-1.pdf
elements = partition_pdf( filename="rok_20230930_1-1.pdf", strategy="hi_res", ) print(str(elements[20]))
Results:
Name of each exchange on which registered New York Stock Exchange
Nameofeachexchangeonwhichregistered NewYorkStockExchange
Closes #2896.
This PR aims to fix
partition_pdf()
to keep spaces in text. The control character\t
is now replaced with a space instead of being removed when merging inferred and embedded elements.Testing
PDF: rok_20230930_1-1.pdf
Results: