Open venkat-amballa opened 4 years ago
Unfortunately, this is probably a by-product of pdfium's (chromium pdf engine) very very "soft" handling of the PDF specifications. I am afraid that unless you can parse the document with pdftoppm -r 200 your_pdf.pdf out
I cannot help you.
Problem: when i tried to split a pdf into multiple pages, i found that in some of the pages data is corrupted. i.e, Though i am able to see corresponding page content clearly using chrome pdf viewer. But the page's output given by convert_from_path looks corruped as shown below.
Due to some sensitive content i cant share the complete pdf
Screenshots This is page 21 of the pdf: This is the individual page 21: Which is the output from ''convert_from_path''
Desktop:
Additional context
Upon observing the result from various sources like:
I am thinking that PDF is no more a Portable Document Format.