evmer / perlego-downloader

Download books from Perlego.com in PDF format
MIT License
106 stars 52 forks source link

paragraphs issue #5

Closed jaan143 closed 1 year ago

jaan143 commented 1 year ago

please check whole book have this text issue

Screenshot 2022-09-06 213612 Screenshot 2022-09-06 213540

this is book link https://www.perlego.com/book/3294395/second-language-pronunciation-bridging-the-gap-between-research-and-teaching-pdf

evmer commented 1 year ago

Thanks for reporting this to me. After some research I found that this is related to a pdfkit/wkhtmltopdf bug that seems to have been ongoing for 4+ years:

https://github.com/wkhtmltopdf/wkhtmltopdf/issues/3256 https://github.com/wkhtmltopdf/wkhtmltopdf/issues/45

Unfortunately there's no solution yet, so all I can do is refactor the script replacing pdfkit with another html2pdf library. It's going to take some time, maybe in the next few days I'll come up with something working.

jaan143 commented 1 year ago

@evmer well you can check this topic mostly peoples fixed in dpi setting and some not https://stackoverflow.com/questions/34241932/letter-spacing-is-too-large-with-wkhtmltopdf

evmer commented 1 year ago

@jaan143 I refactored the script replacing pdfkit with pyppeteer. Now this bug should be fixed, you can try yourself. Unfortunately the pdf building process became slow due to the external font/images rendering, I hope to be able to improve it in a future version.

Don't forget to update the python requirements:

python3 -m pip install pyppeteer

jaan143 commented 1 year ago

@evmer thanks for your efforts Dear :) here is error while converting to pdf in new script

page 347 downloaded Traceback (most recent call last): File "downloader.py", line 183, in asyncio.get_event_loop().run_until_complete(html2pdf()) File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete return future.result() File "downloader.py", line 114, in html2pdf browser = await launch(options={ File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 307, in launch return await Launcher(options, **kwargs).launch() File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 168, in launch self.browserWSEndpoint = get_ws_endpoint(self.url) File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 227, in get_ws_endpoint raise BrowserError('Browser closed unexpectedly:\n') pyppeteer.errors.BrowserError: Browser closed unexpectedly:

PS C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main_4 (new pdf convert library)\perlego-downloader-main>

evmer commented 1 year ago

@jaan143 seems your system is missing some required dependencies:

https://stackoverflow.com/questions/57217924/pyppeteer-errors-browsererror-browser-closed-unexpectedly

jaan143 commented 1 year ago

@evmer check this downloader.py:183: DeprecationWarning: There is no current event loop asyncio.get_event_loop().run_until_complete(html2pdf())

i read a lot topics and they fixing issue in their project code i think you need to add timeout session but i dont know exactly

here is main link https://github.com/miyakogi/pyppeteer

and it is also no more updating

are you tried it in windows os ?

evmer commented 1 year ago

@jaan143 can you please describe better your issue?

This is just a warning and shouldn't break the script execution:

downloader.py:183: DeprecationWarning: There is no current event loop asyncio.get_event_loop().run_until_complete(html2pdf())

Try to reinstall the latest version of Python and upgrade the required dependencies.

jaan143 commented 1 year ago

@evmer actually issue is the same which i show above and i spend whole day to get help from internet (github stackoverflow etc) but cannot get proper answer mostly there are linux related helps

Traceback (most recent call last): File "downloader.py", line 183, in asyncio.get_event_loop().run_until_complete(html2pdf()) File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete return future.result() File "downloader.py", line 114, in html2pdf browser = await launch(options={ File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 307, in launch return await Launcher(options, **kwargs).launch() File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 168, in launch self.browserWSEndpoint = get_ws_endpoint(self.url) File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 227, in get_ws_endpoint raise BrowserError('Browser closed unexpectedly:\n') pyppeteer.errors.BrowserError: Browser closed unexpectedly:

PS C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main_4 (new pdf convert library)\perlego-downloader-main>

evmer commented 1 year ago

@jaan143 I updated the script, can you please try now?

jaan143 commented 1 year ago

@evmer ok let me confirm you

jaan143 commented 1 year ago

@evmer still the same. what OS you are using ? C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main\downloader.py:184: DeprecationWarning: There is no current event loop asyncio.get_event_loop().run_until_complete(html2pdf()) Traceback (most recent call last): File "C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main\downloader.py", line 184, in asyncio.get_event_loop().run_until_complete(html2pdf()) File "C:\Users\Hp\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 641, in run_until_complete return future.result() File "C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main\downloader.py", line 114, in html2pdf browser = await launch(options={ File "C:\Users\Hp\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\launcher.py", line 307, in launch return await Launcher(options, **kwargs).launch() File "C:\Users\Hp\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\launcher.py", line 168, in launch self.browserWSEndpoint = get_ws_endpoint(self.url) File "C:\Users\Hp\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\launcher.py", line 227, in get_ws_endpoint raise BrowserError('Browser closed unexpectedly:\n') pyppeteer.errors.BrowserError: Browser closed unexpectedly:

PS C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main>

evmer commented 1 year ago

@jaan143 I tested it on MacOSX, Linux (Debian) and Windows 10, so it seems a problem related to your configuration.

Can you please follow these instructions for troubleshoot and post the output here?

jaan143 commented 1 year ago

@evmer he saying copy command and run in powershell or cmd but he is asking for docker or aws. anyway i just copied and past in my powershell and here screenshot you can see Screenshot 2022-09-08 161259

evmer commented 1 year ago

@jaan143 can you please copy-paste the printed command and run it? I mean this:

image
jaan143 commented 1 year ago

@evmer here is Screenshot 2022-09-08 223526

evmer commented 1 year ago

@jaan143 to run an executable on powershell you first have to 'dot' source the script, so for you:

./root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome...etc.

jaan143 commented 1 year ago

@evmer here is with dot Screenshot 2022-09-08 224425

evmer commented 1 year ago

@jaan143 I'm not familiar with powershell, try to google the error and make it work

jaan143 commented 1 year ago

@evmer here is cmd Screenshot 2022-09-08 224751

lilfmdude commented 1 year ago

unknown

this is what i get any idea i got it this far

jaan143 commented 1 year ago

@evmer this script is also same working like yours. it is downloading html pages and then making epub. so you can get help from it and make epub file which will more best https://github.com/ilyakharlamov/bookmate_downloader

smack893 commented 1 year ago

did you have a plan to update

lilfmdude commented 1 year ago

i don't know if im getting closer i am getting this now
image

lilfmdude commented 1 year ago

this one is new too hopefully I'm not annoying you with all this image

jaan143 commented 1 year ago

@evmer i dont want to delete html files so can you tell me which lines i need to remove from script code then html files will not be delete ?

evmer commented 1 year ago

Guys, this is not a chat group, please try to follow the Github's guidelines. The original problem was solved by replacing pdfkit with pyppeteer so now I'll mark this as closed. Feel free to open a new request providing detailed info about your issue.

@lilfmdude please see my reply to #7. @jaan143 i can't address issues regarding puppeeter/pyppeteer. Your OS configuration may be different, try to google the error messages or ping their Github repo.

evmer commented 1 year ago

@jaan143 I added some troubleshooting guidelines in the readme, let me know if you can solve your issue. Please consider to download the latest version of the script, I fixed some few bugs.

jaan143 commented 1 year ago

@evmer its done now Thank you very very much :)

evmer commented 1 year ago

@evmer its done now Thank you very very much :)

Finally! 😅 So was the missing chromedriver the issue?

jaan143 commented 1 year ago

@evmer no. it was already set in path problem was in pyppeteer which i reinstall and then pyppeteer reinstall chromium but if i download epub book format its top and bottom margins of pages are very close with text and also page width and height is very big and if download pdf format it is good

evmer commented 1 year ago

but if i download epub book format its top and bottom margins of pages are very close with text and also page width and height is very big

You can try to adjust the margins setting a different value (in pixels) at line 209: https://github.com/evmer/perlego-downloader/blob/main/downloader.py#L209

      options['margin'] = {'top': '10', 'bottom': '10', 'left': '10', 'right': '10'}

Just try to increase them and see how the output looks like

evmer commented 1 year ago

problem was in pyppeteer which i reinstall and then pyppeteer reinstall chromium

Can you please elaborate the solution step-by-step? It's a very common issue and you'd make many users happy. Thank you!

jaan143 commented 1 year ago

@evmer can i also set page size like A4 or letter ?

jaan143 commented 1 year ago

@evmer see this Screenshot 2022-09-13 191525 after pdf i open and i saw some images are breaks in two pages here is book url you can check https://www.perlego.com/book/3260547/cell-biology-a-short-course-pdf