dunso / pdf-parser

Convert PDF content and layout information with pdf.js
Apache License 2.0
21 stars 7 forks source link

Duplicate pageIds #1

Closed Dragomir-Ivanov closed 6 years ago

Dragomir-Ivanov commented 6 years ago

Hello there, When parsing a pdf, I see duplicate pageIds in different pages. Also pages don't seem to be in order. pageId don't seem to be the actual page of the document.

EDIT: Forgot to mention that my document is more than 100 pages long. It seems that the hundreds digit is missing. Maybe updating the PDF.js library can help.

dunso commented 6 years ago

I have fixed the bug that pageId is incorrect when pdf is longer than 100 pages. Thanks for your issue. The pages are not in order. They are parsed randomly and you need to sort by page. The page number starts from 0 and pageId plus 1 is the actual page number. Please try again.

Dragomir-Ivanov commented 6 years ago

Thanks for the fix @dunso. Unfortunately I had to use other library because of this bug, so my project is already done. However might come back to your lovely library when time comes. Cheers!