evmer / perlego-downloader

Download books from Perlego.com in PDF format
MIT License
106 stars 52 forks source link

Incorrect PDF #4

Closed a-sinclaire closed 1 year ago

a-sinclaire commented 1 year ago

Hi, don't know if you can help with this issue...

I eventually got the script to run after making a few modifications: I added encoding to line 140 f = open(f'epub_{BOOK_ID}/{page_no}.html', 'w', encoding='utf-8') and on line 146 I included the option to enable-local-file-access pdfkit.from_file([f'epub_{BOOK_ID}/{i}.html' for i in range(page_no)], f'{BOOK_ID}.pdf', options={'encoding': 'UTF-8', 'enable-local-file-access': None})

However, the resulting PDF seems incomplete. When I delete the PDF and re-run the script the PDF is always in the wrong order, and sometimes I get chapters 1, 3, 9, and 11 (and no others), and other times I get different chapters. The number of pages in the resulting PDF varies as well.

evmer commented 1 year ago

Yeah, the script is not battle-tested for all different kind of books so this issue was expected. Can you please give me the BOOK_ID so I can investigate the problem? Thank you!

jaan143 commented 1 year ago

@evmer try this one its missing chapters https://www.perlego.com/book/2568251/english-rhythm-and-blues-where-language-and-music-come-together-pdf

evmer commented 1 year ago

The issue should have been fixed, let me know

a-sinclaire commented 1 year ago

Looks like your fixed totally worked for me! (BOOK_ID: 1510704)

I did get an error at the end, but it didn't seem to affect the output.

building pdf...
Traceback (most recent call last):
  File "C:\Users\lader\Desktop\perlego-downloader-main\downloader.py", line 158, in <module>
    pdfkit.from_file([f'{book_format}_{BOOK_ID}/{i}.html' for i in range(page_no)], f'{BOOK_ID}.pdf', options={'encoding': 'UTF-8'})
  File "C:\Users\lader\AppData\Local\Programs\Python\Python39\lib\site-packages\pdfkit\api.py", line 51, in from_file
    return r.to_pdf(output_path)
  File "C:\Users\lader\AppData\Local\Programs\Python\Python39\lib\site-packages\pdfkit\pdfkit.py", line 201, in to_pdf
    self.handle_error(exit_code, stderr)
  File "C:\Users\lader\AppData\Local\Programs\Python\Python39\lib\site-packages\pdfkit\pdfkit.py", line 155, in handle_error
    raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Exit with code 1 due to network error: ProtocolUnknownError

But once again, I got this error to stop appearing by editing line 146 to allow the pdf builder to access local files. pdfkit.from_file([f'epub_{BOOK_ID}/{i}.html' for i in range(page_no)], f'{BOOK_ID}.pdf', options={'encoding': 'UTF-8', 'enable-local-file-access': None}) (The output looks identical to my eye even after the fix, so it probably doesn't matter for functionality, but without the fix the script is prevented from removing the temp EPUB directory)

Either way, the big note is that the particular issue I was facing is now fixed! Thank you very much!

jaan143 commented 1 year ago

@a-sinclaire will you experience this ? may be you can help me to fix this issue https://github.com/evmer/perlego-downloader/issues/5

a-sinclaire commented 1 year ago

@a-sinclaire will you experience this ? may be you can help me to fix this issue #5

Sorry! I was helping out a friend and don't have access to my own Perlego account, and can't help or test any more issues relating to this project. Best of Luck!

jaan143 commented 1 year ago

@a-sinclaire i will not take your account evan you get access to my account but please i need help to fix my issue :(