Closed jaan143 closed 1 year ago
Thanks for reporting this to me. After some research I found that this is related to a pdfkit/wkhtmltopdf bug that seems to have been ongoing for 4+ years:
https://github.com/wkhtmltopdf/wkhtmltopdf/issues/3256 https://github.com/wkhtmltopdf/wkhtmltopdf/issues/45
Unfortunately there's no solution yet, so all I can do is refactor the script replacing pdfkit with another html2pdf library. It's going to take some time, maybe in the next few days I'll come up with something working.
@evmer well you can check this topic mostly peoples fixed in dpi setting and some not https://stackoverflow.com/questions/34241932/letter-spacing-is-too-large-with-wkhtmltopdf
@jaan143 I refactored the script replacing pdfkit with pyppeteer. Now this bug should be fixed, you can try yourself. Unfortunately the pdf building process became slow due to the external font/images rendering, I hope to be able to improve it in a future version.
Don't forget to update the python requirements:
python3 -m pip install pyppeteer
@evmer thanks for your efforts Dear :) here is error while converting to pdf in new script
page 347 downloaded
Traceback (most recent call last):
File "downloader.py", line 183, in
PS C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main_4 (new pdf convert library)\perlego-downloader-main>
@jaan143 seems your system is missing some required dependencies:
@evmer check this downloader.py:183: DeprecationWarning: There is no current event loop asyncio.get_event_loop().run_until_complete(html2pdf())
i read a lot topics and they fixing issue in their project code i think you need to add timeout session but i dont know exactly
here is main link https://github.com/miyakogi/pyppeteer
and it is also no more updating
are you tried it in windows os ?
@jaan143 can you please describe better your issue?
This is just a warning and shouldn't break the script execution:
downloader.py:183: DeprecationWarning: There is no current event loop asyncio.get_event_loop().run_until_complete(html2pdf())
Try to reinstall the latest version of Python and upgrade the required dependencies.
@evmer actually issue is the same which i show above and i spend whole day to get help from internet (github stackoverflow etc) but cannot get proper answer mostly there are linux related helps
Traceback (most recent call last): File "downloader.py", line 183, in asyncio.get_event_loop().run_until_complete(html2pdf()) File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete return future.result() File "downloader.py", line 114, in html2pdf browser = await launch(options={ File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 307, in launch return await Launcher(options, **kwargs).launch() File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 168, in launch self.browserWSEndpoint = get_ws_endpoint(self.url) File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 227, in get_ws_endpoint raise BrowserError('Browser closed unexpectedly:\n') pyppeteer.errors.BrowserError: Browser closed unexpectedly:
PS C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main_4 (new pdf convert library)\perlego-downloader-main>
@jaan143 I updated the script, can you please try now?
@evmer ok let me confirm you
@evmer still the same. what OS you are using ?
C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main\downloader.py:184: DeprecationWarning: There is no current event loop
asyncio.get_event_loop().run_until_complete(html2pdf())
Traceback (most recent call last):
File "C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main\downloader.py", line 184, in
PS C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main>
@jaan143 I tested it on MacOSX, Linux (Debian) and Windows 10, so it seems a problem related to your configuration.
Can you please follow these instructions for troubleshoot and post the output here?
@evmer he saying copy command and run in powershell or cmd but he is asking for docker or aws. anyway i just copied and past in my powershell and here screenshot you can see
@jaan143 can you please copy-paste the printed command and run it? I mean this:
@evmer here is
@jaan143 to run an executable on powershell you first have to 'dot' source the script, so for you:
./root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome...etc.
@evmer here is with dot
@jaan143 I'm not familiar with powershell, try to google the error and make it work
@evmer here is cmd
this is what i get any idea i got it this far
@evmer this script is also same working like yours. it is downloading html pages and then making epub. so you can get help from it and make epub file which will more best https://github.com/ilyakharlamov/bookmate_downloader
did you have a plan to update
i don't know if im getting closer i am getting this now
this one is new too hopefully I'm not annoying you with all this
@evmer i dont want to delete html files so can you tell me which lines i need to remove from script code then html files will not be delete ?
Guys, this is not a chat group, please try to follow the Github's guidelines. The original problem was solved by replacing pdfkit with pyppeteer so now I'll mark this as closed. Feel free to open a new request providing detailed info about your issue.
@lilfmdude please see my reply to #7. @jaan143 i can't address issues regarding puppeeter/pyppeteer. Your OS configuration may be different, try to google the error messages or ping their Github repo.
@jaan143 I added some troubleshooting guidelines in the readme, let me know if you can solve your issue. Please consider to download the latest version of the script, I fixed some few bugs.
@evmer its done now Thank you very very much :)
@evmer its done now Thank you very very much :)
Finally! 😅 So was the missing chromedriver the issue?
@evmer no. it was already set in path problem was in pyppeteer which i reinstall and then pyppeteer reinstall chromium but if i download epub book format its top and bottom margins of pages are very close with text and also page width and height is very big and if download pdf format it is good
but if i download epub book format its top and bottom margins of pages are very close with text and also page width and height is very big
You can try to adjust the margins setting a different value (in pixels) at line 209: https://github.com/evmer/perlego-downloader/blob/main/downloader.py#L209
options['margin'] = {'top': '10', 'bottom': '10', 'left': '10', 'right': '10'}
Just try to increase them and see how the output looks like
problem was in pyppeteer which i reinstall and then pyppeteer reinstall chromium
Can you please elaborate the solution step-by-step? It's a very common issue and you'd make many users happy. Thank you!
@evmer can i also set page size like A4 or letter ?
@evmer see this after pdf i open and i saw some images are breaks in two pages here is book url you can check https://www.perlego.com/book/3260547/cell-biology-a-short-course-pdf
please check whole book have this text issue
this is book link https://www.perlego.com/book/3294395/second-language-pronunciation-bridging-the-gap-between-research-and-teaching-pdf